Closed stefan-it closed 2 years ago
@stefan-it XLMR-base and large were post layernorm settings of transformer and XL and XXL are pre layernorm settings.
in preLN setting usually the embeddings are not normalized and there's an LN at the start of transformer block. Though there's extra LN at the end of transformer.
You will need to create the HF transformer also in the same way to get same output
@ngoyal2707 independently of those changes between the base and large I can't load the new XL and XXL models using any fairseq version (without making changes to the state_dict).
If I use version 0.9.0 I get a bunch of unexpected keys because the "decoder" was renamed "encoder".
If I use version >=0.10 I have unexpected keys on the emb_layer_norm
which I assume was renamed to layer_norm
.
RuntimeError: Error(s) in loading state_dict for RobertaModel:
Missing key(s) in state_dict: "encoder.sentence_encoder.emb_layer_norm.weight", "encoder.sentence_encoder.emb_layer_norm.bias".
Unexpected key(s) in state_dict: "encoder.sentence_encoder.layer_norm.weight", "encoder.sentence_encoder.layer_norm.bias", "encoder.sentence_encoder.version".
In any case, those checkpoints seem impossible to load without hacking around.
@ricardorei I installed fairseq
via pip3 install git+https://github.com/pytorch/fairseq.git
, as I've also seen different error messages for various fairseq
version. But with latest master I could load the new larger models :hugs:
@ngoyal2707 Thanks for your explanation :+1: I could see the changes in 54423d3b22a3e7f536e02e9e5445cef9becbd60d so we're currently adjusting the RoBERTa model in Transformers to support the new models :)
I encountered same error, and it seems that layer_norm needs to be added in TransformerSentenceEncoder https://github.com/pytorch/fairseq/blob/master/fairseq/modules/transformer_sentence_encoder.py.
This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!
Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!
Hi :)
I'm currently trying to convert the recently released XLM-R XL and XXL models into Transformers-compatible weights.
I'm using the latest
fairseq
master version (with commit 2fd9d8a972794ba919174baf0d1828a5a4c626f3) and there's something strange with the layer norm parameter.For debugging, here are the parameter names (shortened) for the XLM-R Base model:
the parameter name is
layernorm_embedding
. However, for the new XL models, it outputs:So the parameter name is "layer_norm". When loading the model using
fairseq
library, like:The (shortened) model list for XLM-R Base shows:
whereas the module list for the XL model shows:
So a layer norm is missing in the XL model :thinking:
Side note: I've updates the conversion script in Transformers library to be compatible with latest
fairseq
master. At the end, the script compares a model (forward) pass between the originalfairseq
model and the converted model to see the differences. For the old XLM-R Base model. the output is identical, whereas for XLM-R XL the difference is very high. Script can be found here.Thanks for your help!