Unexpected NYT results.

BenfengXu commented 3 years ago

I directly run the script following README, but obtained abnormal results for NYT Exact Match setting, with the F1 = 0.667201.

My command is python -m main --bert_directory ../pretrained_model/bert_base_uncased_huggingface/ --num_generated_triplets 15 --max_grad_norm 1 --na_rel_coef 0.5 --max_epoch 100 --max_span_length 10

Here is some excerpted logs which might be attributed to:

My config

Unparsed args: ['--num_generated_triplets', '15'] dataset_name : NYT-exact train_file : ./data/NYT/exact_data/train.json valid_file : ./data/NYT/exact_data/valid.json test_file : ./data/NYT/exact_data/test.json generated_data_directory : ./data/generated_data/ generated_param_directory : ./data/generated_data/model_param/ bert_directory : ../pretrained_model/bert_base_uncased_huggingface/ partial : False model_name : Set-Prediction-Networks num_generated_triples : 10 num_decoder_layers : 3 matcher : avg na_rel_coef : 0.5 rel_loss_weight : 1 head_ent_loss_weight : 2 tail_ent_loss_weight : 2 fix_bert_embeddings : True batch_size : 8 max_epoch : 100 gradient_accumulation_steps : 1 decoder_lr : 2e-05 encoder_lr : 1e-05 lr_decay : 0.01 weight_decay : 1e-05 max_grad_norm : 1.0 optimizer : AdamW n_best_size : 100 max_span_length : 10 refresh : False use_gpu : True visible_gpu : 1 random_seed : 1 DATA SUMMARY START: Relation Alphabet Size: 24 Train Instance Number: 56196 Valid Instance Number: 5000 Test Instance Number: 5000 DATA SUMMARY END. Data setting is saved to file: ./data/generated_data/NYT-exact_Set-Prediction-Networks_data.pickle

And the results:

=== Epoch 99 train === Instance: 800; loss: 0.0072 Instance: 1600; loss: 0.0061 Instance: 2400; loss: 0.0053 Instance: 3200; loss: 0.0054 Instance: 4000; loss: 0.0055 Instance: 4800; loss: 0.0049 Instance: 5600; loss: 0.0045 Instance: 6400; loss: 0.0046 Instance: 7200; loss: 0.0042 Instance: 8000; loss: 0.0041 Instance: 8800; loss: 0.0044 Instance: 9600; loss: 0.0043 Instance: 10400; loss: 0.0046 Instance: 11200; loss: 0.0046 Instance: 12000; loss: 0.0046 Instance: 12800; loss: 0.0043 Instance: 13600; loss: 0.0042 Instance: 14400; loss: 0.0040 Instance: 15200; loss: 0.0040 Instance: 16000; loss: 0.0039 Instance: 16800; loss: 0.0038 Instance: 17600; loss: 0.0037 Instance: 18400; loss: 0.0036 Instance: 19200; loss: 0.0035 Instance: 20000; loss: 0.0035 Instance: 20800; loss: 0.0035 Instance: 21600; loss: 0.0034 Instance: 22400; loss: 0.0034 Instance: 23200; loss: 0.0033 Instance: 24000; loss: 0.0034 Instance: 24800; loss: 0.0033 Instance: 25600; loss: 0.0033 Instance: 26400; loss: 0.0033 Instance: 27200; loss: 0.0034 Instance: 28000; loss: 0.0034 Instance: 28800; loss: 0.0034 Instance: 29600; loss: 0.0033 Instance: 30400; loss: 0.0035 Instance: 31200; loss: 0.0038 Instance: 32000; loss: 0.0038 Instance: 32800; loss: 0.0038 Instance: 33600; loss: 0.0037 Instance: 34400; loss: 0.0039 Instance: 35200; loss: 0.0039 Instance: 36000; loss: 0.0039 Instance: 36800; loss: 0.0038 Instance: 37600; loss: 0.0038 Instance: 38400; loss: 0.0039 Instance: 39200; loss: 0.0039 Instance: 40000; loss: 0.0039 Instance: 40800; loss: 0.0039 Instance: 41600; loss: 0.0038 Instance: 42400; loss: 0.0038 Instance: 43200; loss: 0.0038 Instance: 44000; loss: 0.0039 Instance: 44800; loss: 0.0039 Instance: 45600; loss: 0.0041 Instance: 46400; loss: 0.0041 Instance: 47200; loss: 0.0040 Instance: 48000; loss: 0.0040 Instance: 48800; loss: 0.0039 Instance: 49600; loss: 0.0039 Instance: 50400; loss: 0.0039 Instance: 51200; loss: 0.0039 Instance: 52000; loss: 0.0039 Instance: 52800; loss: 0.0039 Instance: 53600; loss: 0.0039 Instance: 54400; loss: 0.0039 Instance: 55200; loss: 0.0039 Instance: 56000; loss: 0.0039 === Epoch 99 Test === ------Num of Gold Triplet is 1------ gold_num = 3240 pred_num = 3732 right_num = 2409 relation_right_num = 2826 entity_right_num = 2864 precision = 0.6454983922829582 recall = 0.7435185185185185 f1_value = 0.6910499139414802 rel_precision = 0.7572347266881029 rel_recall = 0.8722222222222222 rel_f1_value = 0.810671256454389 ent_precision = 0.767416934619507 ent_recall = 0.8839506172839506 ent_f1_value = 0.8215720022948939 ------Num of Gold Triplet is 2------ gold_num = 2094 pred_num = 1675 right_num = 1309 relation_right_num = 1425 entity_right_num = 1476 precision = 0.7814925373134328 recall = 0.625119388729704 f1_value = 0.6946139559564872 rel_precision = 0.8507462686567164 rel_recall = 0.6805157593123209 rel_f1_value = 0.7561687450252057 ent_precision = 0.8811940298507462 ent_recall = 0.7048710601719198 ent_f1_value = 0.7832316264261078 ------Num of Gold Triplet is 3------ gold_num = 942 pred_num = 590 right_num = 501 relation_right_num = 537 entity_right_num = 542 precision = 0.8491525423728814 recall = 0.5318471337579618 f1_value = 0.6540469973890339 rel_precision = 0.9101694915254237 rel_recall = 0.5700636942675159 rel_f1_value = 0.7010443864229765 ent_precision = 0.9186440677966101 ent_recall = 0.5753715498938429 ent_f1_value = 0.7075718015665795 ------Num of Gold Triplet is 4------ gold_num = 1160 pred_num = 619 right_num = 530 relation_right_num = 576 entity_right_num = 559 precision = 0.8562197092084006 recall = 0.45689655172413796 f1_value = 0.59584035975267 rel_precision = 0.9305331179321487 rel_recall = 0.496551724137931 rel_f1_value = 0.6475548060708264 ent_precision = 0.9030694668820679 ent_recall = 0.4818965517241379 ent_f1_value = 0.628442945474986 ------Num of Gold Triplet is greater than or equal to 5------ gold_num = 684 pred_num = 266 right_num = 241 relation_right_num = 254 entity_right_num = 251 precision = 0.9060150375939849 recall = 0.35233918128654973 f1_value = 0.5073684210526316 rel_precision = 0.9548872180451128 rel_recall = 0.3713450292397661 rel_f1_value = 0.5347368421052632 ent_precision = 0.943609022556391 ent_recall = 0.3669590643274854 ent_f1_value = 0.5284210526315789 ------Normal Triplets------ gold_num = 2028 pred_num = 2259 right_num = 1414 relation_right_num = 1725 entity_right_num = 1604 precision = 0.6259406817175741 recall = 0.6972386587771203 f1_value = 0.6596687660368556 rel_precision = 0.7636122177954847 rel_recall = 0.8505917159763313 rel_f1_value = 0.8047585724282714 ent_precision = 0.7100486941124391 ent_recall = 0.7909270216962525 ent_f1_value = 0.748308840681129 ------Multiply label Triplets------ gold_num = 4079 pred_num = 2511 right_num = 2052 relation_right_num = 2202 entity_right_num = 2255 precision = 0.8172043010752689 recall = 0.5030644765873988 f1_value = 0.6227617602427922 rel_precision = 0.8769414575866189 rel_recall = 0.5398381956361853 rel_f1_value = 0.6682852807283762 ent_precision = 0.8980485862206292 ent_recall = 0.5528315763667565 ent_f1_value = 0.684370257966616 ------Overlapping Triplets------ gold_num = 5530 pred_num = 4243 right_num = 3319 relation_right_num = 3608 entity_right_num = 3791 precision = 0.7822295545604525 recall = 0.6001808318264015 f1_value = 0.6792182543742965 rel_precision = 0.8503417393353759 rel_recall = 0.6524412296564195 rel_f1_value = 0.7383607899314436 ent_precision = 0.8934716002828188 ent_recall = 0.6855334538878842 ent_f1_value = 0.7758109076025785 gold_num = 8120 pred_num = 6882 right_num = 4990 relation_right_num = 5618 entity_right_num = 5692 precision = 0.7250799186283057 recall = 0.6145320197044335 f1_value = 0.6652446340487934 rel_precision = 0.8163324614937518 rel_recall = 0.691871921182266 rel_f1_value = 0.7489668044260765 ent_precision = 0.8270851496657948 ent_recall = 0.7009852216748769 ent_f1_value = 0.7588321557125718 Best result on test set is 0.667201 achieving at epoch 90. /pytorch/torch/csrc/utils/python_argparser.cpp:756: UserWarning: This overload of add is deprecated: add(Number alpha, Tensor other) Consider using one of the following signatures instead: add(Tensor other, *, Number alpha)

Right now I'm guessing two possible reasons: 1.I see that the README has just updated recently, so I have not set --num_generated_triplets correctly, but I'm not sure this would lead to dramaticly performance drop from expected 90 to 66. 2.The script and model default to set fix_bert_embeddings=True, as can be refered to here, which is not the usual case where people finetune BERT on downstream tasks instead of freeze it.

Great thanks for your attention and help!

DianboWork commented 3 years ago

Please use the updated version. The "num_generated_triples" hyperparameter is very important

BenfengXu commented 3 years ago

Please use the updated version. The "num_generated_triples" hyperparameter is very important

Thanks for the quick reply, I will re-run with the corrected script and post my feedback~

BenfengXu commented 3 years ago

Please use the updated version. The "num_generated_triples" hyperparameter is very important

I've correct the --num_generated_triples parameter to 15 for NYT Exact, but still got the same results of F1 = 0.667109, achieved at epoch 95.

And I also tried set --fix_bert_embeddings to False, which results in F1 = 0.666058, achieved at epoch 89, so this is also not the real problem.

Meanwhile, I tried WebNLG, which is much better, F1 = 0.876756 reached at epoch 78.

I'm using python==3.7.8, torch=1.5.0, transformers=2.6.0 as instructed, below is excerpted log from the NYT Exact run:

dataset_name : NYT-exact train_file : ./data/NYT/exact_data/train.json valid_file : ./data/NYT/exact_data/valid.json test_file : ./data/NYT/exact_data/test.json generated_data_directory : ./data/generated_data/ generated_param_directory : ./data/generated_data/model_param/ bert_directory : ../pretrained_model/bert_base_uncased_huggingface/ partial : False model_name : Set-Prediction-Networks num_generated_triples : 15 num_decoder_layers : 3 matcher : avg na_rel_coef : 0.5 rel_loss_weight : 1 head_ent_loss_weight : 2 tail_ent_loss_weight : 2 fix_bert_embeddings : True batch_size : 8 max_epoch : 100 gradient_accumulation_steps : 1 decoder_lr : 2e-05 encoder_lr : 1e-05 lr_decay : 0.01 weight_decay : 1e-05 max_grad_norm : 1.0 optimizer : AdamW n_best_size : 100 max_span_length : 10 refresh : False use_gpu : True visible_gpu : 7 random_seed : 1 Data setting is loaded from file: ./data/generated_data/NYT-exact_Set-Prediction-Networks_data.pickle DATA SUMMARY START: Relation Alphabet Size: 24 Train Instance Number: 56196 Valid Instance Number: 5000 Test Instance Number: 5000 DATA SUMMARY END. === Epoch 0 train === Instance: 800; loss: 7.5889 Instance: 1600; loss: 5.6746 Instance: 2400; loss: 4.7977 Instance: 3200; loss: 4.2368 Instance: 4000; loss: 3.7949 Instance: 4800; loss: 3.4715 Instance: 5600; loss: 3.2019 Instance: 6400; loss: 2.9838 Instance: 7200; loss: 2.8109 Instance: 8000; loss: 2.6600 Instance: 8800; loss: 2.5287 Instance: 9600; loss: 2.4088 Instance: 10400; loss: 2.3105 Instance: 11200; loss: 2.2216 Instance: 12000; loss: 2.1473 Instance: 12800; loss: 2.0792 Instance: 13600; loss: 2.0144 Instance: 14400; loss: 1.9541 Instance: 15200; loss: 1.9034 Instance: 16000; loss: 1.8501 Instance: 16800; loss: 1.8009 Instance: 17600; loss: 1.7592 Instance: 18400; loss: 1.7196 Instance: 19200; loss: 1.6878 Instance: 20000; loss: 1.6573 Instance: 20800; loss: 1.6258 Instance: 21600; loss: 1.5960 Instance: 22400; loss: 1.5691 Instance: 23200; loss: 1.5420 Instance: 24000; loss: 1.5128 Instance: 24800; loss: 1.4896 Instance: 25600; loss: 1.4695 Instance: 26400; loss: 1.4482 Instance: 27200; loss: 1.4276 Instance: 28000; loss: 1.4075 Instance: 28800; loss: 1.3877 Instance: 29600; loss: 1.3692 Instance: 30400; loss: 1.3508 Instance: 31200; loss: 1.3353 Instance: 32000; loss: 1.3201 Instance: 32800; loss: 1.3033 Instance: 33600; loss: 1.2889 Instance: 34400; loss: 1.2740 Instance: 35200; loss: 1.2591 Instance: 36000; loss: 1.2437 Instance: 36800; loss: 1.2299 Instance: 37600; loss: 1.2158 Instance: 38400; loss: 1.2052 Instance: 39200; loss: 1.1933 Instance: 40000; loss: 1.1816 Instance: 40800; loss: 1.1702 Instance: 41600; loss: 1.1610 Instance: 42400; loss: 1.1511 Instance: 43200; loss: 1.1419 Instance: 44000; loss: 1.1332 Instance: 44800; loss: 1.1255 Instance: 45600; loss: 1.1162 Instance: 46400; loss: 1.1075 Instance: 47200; loss: 1.0987 Instance: 48000; loss: 1.0896 Instance: 48800; loss: 1.0815 Instance: 49600; loss: 1.0745 Instance: 50400; loss: 1.0649 Instance: 51200; loss: 1.0560 Instance: 52000; loss: 1.0496 Instance: 52800; loss: 1.0432 Instance: 53600; loss: 1.0355 Instance: 54400; loss: 1.0276 Instance: 55200; loss: 1.0216 Instance: 56000; loss: 1.0143 === Epoch 0 Test === ------Num of Gold Triplet is 1------ gold_num = 3240 pred_num = 3964 right_num = 2088 relation_right_num = 2927 entity_right_num = 2626 precision = 0.5267406659939455 recall = 0.6444444444444445 f1_value = 0.5796779566907274 rel_precision = 0.7383955600403632 rel_recall = 0.903395061728395 rel_f1_value = 0.8126041088284287 ent_precision = 0.6624621594349143 ent_recall = 0.8104938271604938 ent_f1_value = 0.7290394225430317 ------Num of Gold Triplet is 2------ gold_num = 2094 pred_num = 1575 right_num = 1084 relation_right_num = 1271 entity_right_num = 1288 precision = 0.6882539682539682 recall = 0.5176695319961796 f1_value = 0.5908967020986645 rel_precision = 0.806984126984127 rel_recall = 0.6069723018147087 rel_f1_value = 0.6928318342872718 ent_precision = 0.8177777777777778 ent_recall = 0.6150907354345749 ent_f1_value = 0.702098664486236 ------Num of Gold Triplet is 3------ gold_num = 942 pred_num = 498 right_num = 349 relation_right_num = 419 entity_right_num = 409 precision = 0.7008032128514057 recall = 0.37048832271762205 f1_value = 0.4847222222222222 rel_precision = 0.8413654618473896 rel_recall = 0.4447983014861996 rel_f1_value = 0.5819444444444445 ent_precision = 0.821285140562249 ent_recall = 0.43418259023354566 ent_f1_value = 0.5680555555555556 ------Num of Gold Triplet is 4------ gold_num = 1160 pred_num = 457 right_num = 342 relation_right_num = 405 entity_right_num = 375 precision = 0.7483588621444202 recall = 0.29482758620689653 f1_value = 0.42300556586270865 rel_precision = 0.8862144420131292 rel_recall = 0.34913793103448276 rel_f1_value = 0.5009276437847867 ent_precision = 0.8205689277899344 ent_recall = 0.3232758620689655 ent_f1_value = 0.46382189239332094 ------Num of Gold Triplet is greater than or equal to 5------ gold_num = 684 pred_num = 178 right_num = 153 relation_right_num = 169 entity_right_num = 163 precision = 0.8595505617977528 recall = 0.2236842105263158 f1_value = 0.3549883990719258 rel_precision = 0.949438202247191 rel_recall = 0.24707602339181287 rel_f1_value = 0.39211136890951276 ent_precision = 0.9157303370786517 ent_recall = 0.23830409356725146 ent_f1_value = 0.37819025522041766 ------Normal Triplets------ gold_num = 2028 pred_num = 2490 right_num = 1174 relation_right_num = 1836 entity_right_num = 1455 precision = 0.4714859437751004 recall = 0.5788954635108481 f1_value = 0.5196989818503762 rel_precision = 0.7373493975903614 rel_recall = 0.9053254437869822 rel_f1_value = 0.8127490039840637 ent_precision = 0.5843373493975904 ent_recall = 0.7174556213017751 ent_f1_value = 0.6440903054448871 ------Multiply label Triplets------ gold_num = 4079 pred_num = 2119 right_num = 1492 relation_right_num = 1728 entity_right_num = 1737 precision = 0.7041057102406796 recall = 0.3657759254719294 f1_value = 0.4814456276218135 rel_precision = 0.8154789995280792 rel_recall = 0.4236332434420201 rel_f1_value = 0.5575992255566311 ent_precision = 0.8197262859839547 ent_recall = 0.4258396665849473 ent_f1_value = 0.5605033881897387 ------Overlapping Triplets------ gold_num = 5530 pred_num = 3794 right_num = 2632 relation_right_num = 3096 entity_right_num = 3139 precision = 0.6937269372693727 recall = 0.4759493670886076 f1_value = 0.5645645645645646 rel_precision = 0.816025303110174 rel_recall = 0.5598553345388788 rel_f1_value = 0.6640926640926642 ent_precision = 0.827358987875593 ent_recall = 0.567631103074141 ent_f1_value = 0.6733161733161733 gold_num = 8120 pred_num = 6672 right_num = 4016 relation_right_num = 5191 entity_right_num = 4861 precision = 0.6019184652278178 recall = 0.4945812807881773 f1_value = 0.5429962141698215 rel_precision = 0.7780275779376499 rel_recall = 0.6392857142857142 rel_f1_value = 0.7018658734451053 ent_precision = 0.7285671462829736 ent_recall = 0.5986453201970443 ent_f1_value = 0.6572471606273661 Achieving Best Result on Test Set. === Epoch 1 train === Instance: 800; loss: 0.5340 Instance: 1600; loss: 0.5154 Instance: 2400; loss: 0.5168 Instance: 3200; loss: 0.5275 Instance: 4000; loss: 0.5287 Instance: 4800; loss: 0.5276 Instance: 5600; loss: 0.5244 Instance: 6400; loss: 0.5159 Instance: 7200; loss: 0.5104 Instance: 8000; loss: 0.5057

Don't know where the problem is... so right now I'll just re-clone this repo and re-download all the data, and will try again...

DianboWork commented 3 years ago

Oh！You should use the bert base cased, since many entity mentions are capitalized in English. In our paper, you can find this setting in the Implementation Details.

BenfengXu commented 3 years ago

Oh！You should use the bert base cased, since many entity mentions are capitalized in English. In our paper, you can find this setting in the Implementation Details.

It seems that Bert_base_cased vs uncased is not the reason for the corrupted performance, they produce similar results.

But after I re-clone and re-download the data, the performance become normal again, with F1=0.921405 on NYT Exact (0.923 reported), and F1=0.926382 on WebNLG Partial (0.934 reported). So I might have got something wrong previously, although I still do not know how. But now I decide to leave it behind...

Again, great thanks for your quick reply and careful help! Have a nice day~

I'm closing this issue now.

DianboWork / SPN4RE

Unexpected NYT results. #12