In the paper you write that you use macro F1-scores. On JNLPBA, however, we typically see micro average being ~4pp lower than the macro average and in the logs in your GitHub repository (https://github.com/allenai/scibert/tree/master/results) SciBERT seems to achieve micro F1-scores in the range of 77%. Thus, I’m wondering whether the macro in your paper might be a typo?
Hey,
thanks for your awesome work!
In the paper you write that you use macro F1-scores. On JNLPBA, however, we typically see micro average being ~4pp lower than the macro average and in the logs in your GitHub repository (https://github.com/allenai/scibert/tree/master/results) SciBERT seems to achieve micro F1-scores in the range of 77%. Thus, I’m wondering whether the macro in your paper might be a typo?