allenai / scibert

A BERT model for scientific text.
https://arxiv.org/abs/1903.10676
Apache License 2.0
1.48k stars 216 forks source link

Why no task-specific fine-tuning // any plans? #42

Open cbockman opened 5 years ago

cbockman commented 5 years ago

If I understand correctly, the weights were used directly from BERT, with the only free parameters the LSTM+MLP layers:

"For simplicity, experiments are performed without any hyperparameter tuning and with fixed BERT weights"

1) Why this choice? I see the footnote that this is 2.5x slower, but that doesn't seem that prohibitive (is all relative, of course!), given that the original BERT paper (https://arxiv.org/pdf/1810.04805.pdf, section 5.4) observed a 1.5 point bump from fine-tuning on the downstream tasks.

2) Any plans to issue a second iteration of the paper with fine-tuning? Seems like there might be a lot of upside left on the floor here.

Not a criticism, just 1) trying to understand how far SciBERT pushes things, in-domain, and 2) think this is neat and would love to see you push things all the way. :)

kyleclo commented 5 years ago

Thanks for the interest. We're currently in the process of doing the fine-tuning experiments :) Look forward to the updated results when they finish