Closed JohnGiorgi closed 2 years ago
Hey @JohnGiorgi, I do think this would be a good addition. Feel free to ping me when you start the PR!
This issue is being closed due to lack of activity. If you think it still needs to be addressed, please comment on this thread 👇
Oops, still working on #5505 so I think it makes sense to keep this open!
Unfortunately there's no easy way to check if an issue has an open linked pull request from the GitHub API, which should be a sufficient condition to keep the issue open 😕
Is your feature request related to a problem? Please describe.
The paper Revisiting Few-sample BERT Fine-tuning (published at ICLR 2021) demonstrated that re-initializing the last few layers of a pretrained transformer before fine-tuning can reduce the variance between re-runs, speed up convergence and improve final task performance, nicely summarized in their figures:
The intuition is that some of the final layers may be over-specified to the pretraining objective(s) and therefore the pretrained weights can provide a bad initialization for downstream tasks.
It would be nice if re-initializing the the weights of certain layers in a pretrained transformer model was easy to do with AllenNLP.
Describe the solution you'd like
Ideally, you could easily specify which layers to re-initialize in a
PretrainedTransformerEmbedder
, something like:The
__init__
ofPretrainedTransformerEmbedder
would take care of correctly re-initializing the specified layers for the givenmodel_name
.Describe alternatives you've considered
You could achieve this right now with the AllenNLP initializers, but this would require:
mean=0
andstd=0.02
. Ideally, the user wouldn't have to know/specify this.Additional context
I've drafted a solution that works (but requires more testing). Essentially, we add a new parameter to
PretrainedTransformerEmbedder
,reinit_layers
, which can be an integer or list of integers. In__init__
, we re-initialize as follows:load_weights == False
, as the weights are already being randomly initialized._init_weights
function from HF Transformers, which knows how to initialize the parameters of a layer correctly for a given pretrained model.reinit_layers
is None, so this should be backward compatible with existing configs.I sanity-checked it by testing that the weights of the specified layers are indeed re-initialized. I also trained a model with re-initialized layers on my own task and got a non-negligible performance boost.
If the AllenNLP maintainers think this would be a good addition I would be happy to open a PR!