Closed BenfengXu closed 3 years ago
Please use the updated version. The "num_generated_triples" hyperparameter is very important
Please use the updated version. The "num_generated_triples" hyperparameter is very important
Thanks for the quick reply, I will re-run with the corrected script and post my feedback~
Please use the updated version. The "num_generated_triples" hyperparameter is very important
I've correct the --num_generated_triples
parameter to 15 for NYT Exact, but still got the same results of F1 = 0.667109, achieved at epoch 95.
And I also tried set --fix_bert_embeddings
to False, which results in F1 = 0.666058, achieved at epoch 89, so this is also not the real problem.
Meanwhile, I tried WebNLG, which is much better, F1 = 0.876756 reached at epoch 78.
I'm using python==3.7.8, torch=1.5.0, transformers=2.6.0 as instructed, below is excerpted log from the NYT Exact run:
dataset_name : NYT-exact train_file : ./data/NYT/exact_data/train.json valid_file : ./data/NYT/exact_data/valid.json test_file : ./data/NYT/exact_data/test.json generated_data_directory : ./data/generated_data/ generated_param_directory : ./data/generated_data/model_param/ bert_directory : ../pretrained_model/bert_base_uncased_huggingface/ partial : False model_name : Set-Prediction-Networks num_generated_triples : 15 num_decoder_layers : 3 matcher : avg na_rel_coef : 0.5 rel_loss_weight : 1 head_ent_loss_weight : 2 tail_ent_loss_weight : 2 fix_bert_embeddings : True batch_size : 8 max_epoch : 100 gradient_accumulation_steps : 1 decoder_lr : 2e-05 encoder_lr : 1e-05 lr_decay : 0.01 weight_decay : 1e-05 max_grad_norm : 1.0 optimizer : AdamW n_best_size : 100 max_span_length : 10 refresh : False use_gpu : True visible_gpu : 7 random_seed : 1 Data setting is loaded from file: ./data/generated_data/NYT-exact_Set-Prediction-Networks_data.pickle DATA SUMMARY START: Relation Alphabet Size: 24 Train Instance Number: 56196 Valid Instance Number: 5000 Test Instance Number: 5000 DATA SUMMARY END. === Epoch 0 train === Instance: 800; loss: 7.5889 Instance: 1600; loss: 5.6746 Instance: 2400; loss: 4.7977 Instance: 3200; loss: 4.2368 Instance: 4000; loss: 3.7949 Instance: 4800; loss: 3.4715 Instance: 5600; loss: 3.2019 Instance: 6400; loss: 2.9838 Instance: 7200; loss: 2.8109 Instance: 8000; loss: 2.6600 Instance: 8800; loss: 2.5287 Instance: 9600; loss: 2.4088 Instance: 10400; loss: 2.3105 Instance: 11200; loss: 2.2216 Instance: 12000; loss: 2.1473 Instance: 12800; loss: 2.0792 Instance: 13600; loss: 2.0144 Instance: 14400; loss: 1.9541 Instance: 15200; loss: 1.9034 Instance: 16000; loss: 1.8501 Instance: 16800; loss: 1.8009 Instance: 17600; loss: 1.7592 Instance: 18400; loss: 1.7196 Instance: 19200; loss: 1.6878 Instance: 20000; loss: 1.6573 Instance: 20800; loss: 1.6258 Instance: 21600; loss: 1.5960 Instance: 22400; loss: 1.5691 Instance: 23200; loss: 1.5420 Instance: 24000; loss: 1.5128 Instance: 24800; loss: 1.4896 Instance: 25600; loss: 1.4695 Instance: 26400; loss: 1.4482 Instance: 27200; loss: 1.4276 Instance: 28000; loss: 1.4075 Instance: 28800; loss: 1.3877 Instance: 29600; loss: 1.3692 Instance: 30400; loss: 1.3508 Instance: 31200; loss: 1.3353 Instance: 32000; loss: 1.3201 Instance: 32800; loss: 1.3033 Instance: 33600; loss: 1.2889 Instance: 34400; loss: 1.2740 Instance: 35200; loss: 1.2591 Instance: 36000; loss: 1.2437 Instance: 36800; loss: 1.2299 Instance: 37600; loss: 1.2158 Instance: 38400; loss: 1.2052 Instance: 39200; loss: 1.1933 Instance: 40000; loss: 1.1816 Instance: 40800; loss: 1.1702 Instance: 41600; loss: 1.1610 Instance: 42400; loss: 1.1511 Instance: 43200; loss: 1.1419 Instance: 44000; loss: 1.1332 Instance: 44800; loss: 1.1255 Instance: 45600; loss: 1.1162 Instance: 46400; loss: 1.1075 Instance: 47200; loss: 1.0987 Instance: 48000; loss: 1.0896 Instance: 48800; loss: 1.0815 Instance: 49600; loss: 1.0745 Instance: 50400; loss: 1.0649 Instance: 51200; loss: 1.0560 Instance: 52000; loss: 1.0496 Instance: 52800; loss: 1.0432 Instance: 53600; loss: 1.0355 Instance: 54400; loss: 1.0276 Instance: 55200; loss: 1.0216 Instance: 56000; loss: 1.0143 === Epoch 0 Test === ------Num of Gold Triplet is 1------ gold_num = 3240 pred_num = 3964 right_num = 2088 relation_right_num = 2927 entity_right_num = 2626 precision = 0.5267406659939455 recall = 0.6444444444444445 f1_value = 0.5796779566907274 rel_precision = 0.7383955600403632 rel_recall = 0.903395061728395 rel_f1_value = 0.8126041088284287 ent_precision = 0.6624621594349143 ent_recall = 0.8104938271604938 ent_f1_value = 0.7290394225430317 ------Num of Gold Triplet is 2------ gold_num = 2094 pred_num = 1575 right_num = 1084 relation_right_num = 1271 entity_right_num = 1288 precision = 0.6882539682539682 recall = 0.5176695319961796 f1_value = 0.5908967020986645 rel_precision = 0.806984126984127 rel_recall = 0.6069723018147087 rel_f1_value = 0.6928318342872718 ent_precision = 0.8177777777777778 ent_recall = 0.6150907354345749 ent_f1_value = 0.702098664486236 ------Num of Gold Triplet is 3------ gold_num = 942 pred_num = 498 right_num = 349 relation_right_num = 419 entity_right_num = 409 precision = 0.7008032128514057 recall = 0.37048832271762205 f1_value = 0.4847222222222222 rel_precision = 0.8413654618473896 rel_recall = 0.4447983014861996 rel_f1_value = 0.5819444444444445 ent_precision = 0.821285140562249 ent_recall = 0.43418259023354566 ent_f1_value = 0.5680555555555556 ------Num of Gold Triplet is 4------ gold_num = 1160 pred_num = 457 right_num = 342 relation_right_num = 405 entity_right_num = 375 precision = 0.7483588621444202 recall = 0.29482758620689653 f1_value = 0.42300556586270865 rel_precision = 0.8862144420131292 rel_recall = 0.34913793103448276 rel_f1_value = 0.5009276437847867 ent_precision = 0.8205689277899344 ent_recall = 0.3232758620689655 ent_f1_value = 0.46382189239332094 ------Num of Gold Triplet is greater than or equal to 5------ gold_num = 684 pred_num = 178 right_num = 153 relation_right_num = 169 entity_right_num = 163 precision = 0.8595505617977528 recall = 0.2236842105263158 f1_value = 0.3549883990719258 rel_precision = 0.949438202247191 rel_recall = 0.24707602339181287 rel_f1_value = 0.39211136890951276 ent_precision = 0.9157303370786517 ent_recall = 0.23830409356725146 ent_f1_value = 0.37819025522041766 ------Normal Triplets------ gold_num = 2028 pred_num = 2490 right_num = 1174 relation_right_num = 1836 entity_right_num = 1455 precision = 0.4714859437751004 recall = 0.5788954635108481 f1_value = 0.5196989818503762 rel_precision = 0.7373493975903614 rel_recall = 0.9053254437869822 rel_f1_value = 0.8127490039840637 ent_precision = 0.5843373493975904 ent_recall = 0.7174556213017751 ent_f1_value = 0.6440903054448871 ------Multiply label Triplets------ gold_num = 4079 pred_num = 2119 right_num = 1492 relation_right_num = 1728 entity_right_num = 1737 precision = 0.7041057102406796 recall = 0.3657759254719294 f1_value = 0.4814456276218135 rel_precision = 0.8154789995280792 rel_recall = 0.4236332434420201 rel_f1_value = 0.5575992255566311 ent_precision = 0.8197262859839547 ent_recall = 0.4258396665849473 ent_f1_value = 0.5605033881897387 ------Overlapping Triplets------ gold_num = 5530 pred_num = 3794 right_num = 2632 relation_right_num = 3096 entity_right_num = 3139 precision = 0.6937269372693727 recall = 0.4759493670886076 f1_value = 0.5645645645645646 rel_precision = 0.816025303110174 rel_recall = 0.5598553345388788 rel_f1_value = 0.6640926640926642 ent_precision = 0.827358987875593 ent_recall = 0.567631103074141 ent_f1_value = 0.6733161733161733 gold_num = 8120 pred_num = 6672 right_num = 4016 relation_right_num = 5191 entity_right_num = 4861 precision = 0.6019184652278178 recall = 0.4945812807881773 f1_value = 0.5429962141698215 rel_precision = 0.7780275779376499 rel_recall = 0.6392857142857142 rel_f1_value = 0.7018658734451053 ent_precision = 0.7285671462829736 ent_recall = 0.5986453201970443 ent_f1_value = 0.6572471606273661 Achieving Best Result on Test Set. === Epoch 1 train === Instance: 800; loss: 0.5340 Instance: 1600; loss: 0.5154 Instance: 2400; loss: 0.5168 Instance: 3200; loss: 0.5275 Instance: 4000; loss: 0.5287 Instance: 4800; loss: 0.5276 Instance: 5600; loss: 0.5244 Instance: 6400; loss: 0.5159 Instance: 7200; loss: 0.5104 Instance: 8000; loss: 0.5057
Don't know where the problem is... so right now I'll just re-clone this repo and re-download all the data, and will try again...
Oh!You should use the bert base cased, since many entity mentions are capitalized in English. In our paper, you can find this setting in the Implementation Details.
Oh!You should use the bert base cased, since many entity mentions are capitalized in English. In our paper, you can find this setting in the Implementation Details.
It seems that Bert_base_cased vs uncased is not the reason for the corrupted performance, they produce similar results.
But after I re-clone and re-download the data, the performance become normal again, with F1=0.921405 on NYT Exact (0.923 reported), and F1=0.926382 on WebNLG Partial (0.934 reported). So I might have got something wrong previously, although I still do not know how. But now I decide to leave it behind...
Again, great thanks for your quick reply and careful help! Have a nice day~
I'm closing this issue now.
I directly run the script following README, but obtained abnormal results for NYT Exact Match setting, with the F1 = 0.667201.
My command is
python -m main --bert_directory ../pretrained_model/bert_base_uncased_huggingface/ --num_generated_triplets 15 --max_grad_norm 1 --na_rel_coef 0.5 --max_epoch 100 --max_span_length 10
Here is some excerpted logs which might be attributed to:
Right now I'm guessing two possible reasons: 1.I see that the README has just updated recently, so I have not set
--num_generated_triplets
correctly, but I'm not sure this would lead to dramaticly performance drop from expected 90 to 66. 2.The script and model default to setfix_bert_embeddings=True
, as can be refered to here, which is not the usual case where people finetune BERT on downstream tasks instead of freeze it.Great thanks for your attention and help!