Why no task-specific fine-tuning // any plans?

If I understand correctly, the weights were used directly from BERT, with the only free parameters the LSTM+MLP layers:

"For simplicity, experiments are performed without any hyperparameter tuning and with fixed BERT weights"

1) Why this choice? I see the footnote that this is 2.5x slower, but that doesn't seem that prohibitive (is all relative, of course!), given that the original BERT paper (https://arxiv.org/pdf/1810.04805.pdf, section 5.4) observed a 1.5 point bump from fine-tuning on the downstream tasks.

2) Any plans to issue a second iteration of the paper with fine-tuning? Seems like there might be a lot of upside left on the floor here.

Not a criticism, just 1) trying to understand how far SciBERT pushes things, in-domain, and 2) think this is neat and would love to see you push things all the way. :)

allenai / scibert

Why no task-specific fine-tuning // any plans? #42