allenai / scibert

A BERT model for scientific text.
https://arxiv.org/abs/1903.10676
Apache License 2.0
1.47k stars 214 forks source link

Better performance than the reported. #100

Open shizhediao opened 3 years ago

shizhediao commented 3 years ago

Hi, I am reproducing the fine-tune results following your instruction. I work on your default code and my setting is as below. I work on ebmnlp and pico task without fine-tuning

DATASET='ebmnlp', TASK='pico', with_finetuning='' #'_finetune' # or '' for not fine tuning, dataset_size=38124,

However, I found my results are much better than the reported number in your paper. In scibert paper, the micro F1 is 68.30 (Frozen) and 72.28 (fine-tune), but in my experiment, the F1 is 75.48 (frozen).

"test_accuracy": 0.8668599033816425,
"test_accuracy3": 0.9632463768115942,
"test_F1_O": 0.9153885841369629,
"test_F1_I-OUT": 0.673623263835907,
"test_F1_I-PAR": 0.7656552195549011,
"test_F1_I-INT": 0.6646913886070251,
"test_avg_f1": 0.754839614033699,
"test_loss": 4.438140077646389

I am really confused. Thanks

shizhediao commented 3 years ago

Thanks to @kyleclo , this question has been solved. This is because I use the wrong metrics. The correct one should be (test_F1_I-OUT+test_F1_I-PAR+test_F1_I-INT)/3 = 0.7013232907. I am curious why it is still higher than the reported number, which is 68.30