Better performance than the reported.

Hi, I am reproducing the fine-tune results following your instruction. I work on your default code and my setting is as below. I work on ebmnlp and pico task without fine-tuning

DATASET='ebmnlp', TASK='pico', with_finetuning='' #'_finetune' # or '' for not fine tuning, dataset_size=38124,

However, I found my results are much better than the reported number in your paper. In scibert paper, the micro F1 is 68.30 (Frozen) and 72.28 (fine-tune), but in my experiment, the F1 is 75.48 (frozen).

"test_accuracy": 0.8668599033816425,
"test_accuracy3": 0.9632463768115942,
"test_F1_O": 0.9153885841369629,
"test_F1_I-OUT": 0.673623263835907,
"test_F1_I-PAR": 0.7656552195549011,
"test_F1_I-INT": 0.6646913886070251,
"test_avg_f1": 0.754839614033699,
"test_loss": 4.438140077646389

I am really confused. Thanks

allenai / scibert

Better performance than the reported. #100