facebookresearch / XLM

PyTorch original implementation of Cross-lingual Language Model Pretraining.
Other
2.87k stars 495 forks source link

generate embeddings missing/unexpected keys error #318

Closed sylyoung closed 3 years ago

sylyoung commented 3 years ago

Hi! I was trying to generate embeddings for my language model and following generate-embeddings.ipynb, reloaded the best-valid_mlm_ppl.pth, but ran into following error:

FAISS library was not found.
FAISS not available. Switching to standard nearest neighbors search implementation.
Supported languages: en, ar
Traceback (most recent call last):
  File "generate-embeddings.py", line 34, in <module>
    model.load_state_dict(reloaded['model'])
  File "/share/pkg.7/pytorch/1.1/install/3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 777, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for TransformerModel:
        Missing key(s) in state_dict: "position_embeddings.weight", "embeddings.weight", "layer_norm_emb.weight", "layer_norm_emb.bias"
, "attentions.0.q_lin.weight", "attentions.0.q_lin.bias", "attentions.0.k_lin.weight", "attentions.0.k_lin.bias", "attentions.0.v_lin.w
eight", "attentions.0.v_lin.bias", "attentions.0.out_lin.weight", "attentions.0.out_lin.bias", "attentions.1.q_lin.weight", "attentions
.1.q_lin.bias", "attentions.1.k_lin.weight", "attentions.1.k_lin.bias", "attentions.1.v_lin.weight", "attentions.1.v_lin.bias", "attent
ions.1.out_lin.weight", "attentions.1.out_lin.bias", "attentions.2.q_lin.weight", "attentions.2.q_lin.bias", "attentions.2.k_lin.weight
", "attentions.2.k_lin.bias", "attentions.2.v_lin.weight", "attentions.2.v_lin.bias", "attentions.2.out_lin.weight", "attentions.2.out_
lin.bias", "attentions.3.q_lin.weight", "attentions.3.q_lin.bias", "attentions.3.k_lin.weight", "attentions.3.k_lin.bias", "attentions.
3.v_lin.weight", "attentions.3.v_lin.bias", "attentions.3.out_lin.weight", "attentions.3.out_lin.bias", "attentions.4.q_lin.weight", "a
ttentions.4.q_lin.bias", "attentions.4.k_lin.weight", "attentions.4.k_lin.bias", "attentions.4.v_lin.weight", "attentions.4.v_lin.bias"
, "attentions.4.out_lin.weight", "attentions.4.out_lin.bias", "attentions.5.q_lin.weight", "attentions.5.q_lin.bias", "attentions.5.k_l
in.weight", "attentions.5.k_lin.bias", "attentions.5.v_lin.weight", "attentions.5.v_lin.bias", "attentions.5.out_lin.weight", "attentio
ns.5.out_lin.bias", "layer_norm1.0.weight", "layer_norm1.0.bias", "layer_norm1.1.weight", "layer_norm1.1.bias", "layer_norm1.2.weight",
 "layer_norm1.2.bias", "layer_norm1.3.weight", "layer_norm1.3.bias", "layer_norm1.4.weight", "layer_norm1.4.bias", "layer_norm1.5.weigh
t", "layer_norm1.5.bias", "ffns.0.lin1.weight", "ffns.0.lin1.bias", "ffns.0.lin2.weight", "ffns.0.lin2.bias", "ffns.1.lin1.weight", "ff
ns.1.lin1.bias", "ffns.1.lin2.weight", "ffns.1.lin2.bias", "ffns.2.lin1.weight", "ffns.2.lin1.bias", "ffns.2.lin2.weight", "ffns.2.lin2
.bias", "ffns.3.lin1.weight", "ffns.3.lin1.bias", "ffns.3.lin2.weight", "ffns.3.lin2.bias", "ffns.4.lin1.weight", "ffns.4.lin1.bias", "
ffns.4.lin2.weight", "ffns.4.lin2.bias", "ffns.5.lin1.weight", "ffns.5.lin1.bias", "ffns.5.lin2.weight", "ffns.5.lin2.bias", "layer_nor
m2.0.weight", "layer_norm2.0.bias", "layer_norm2.1.weight", "layer_norm2.1.bias", "layer_norm2.2.weight", "layer_norm2.2.bias", "layer_
norm2.3.weight", "layer_norm2.3.bias", "layer_norm2.4.weight", "layer_norm2.4.bias", "layer_norm2.5.weight", "layer_norm2.5.bias", "pre
d_layer.proj.weight", "pred_layer.proj.bias". 
    Unexpected key(s) in state_dict: "module.position_embeddings.weight", "module.lang_embeddings.weight", "module.embeddings.weigh
t", "module.layer_norm_emb.weight", "module.layer_norm_emb.bias", "module.attentions.0.q_lin.weight", "module.attentions.0.q_lin.bias",
 "module.attentions.0.k_lin.weight", "module.attentions.0.k_lin.bias", "module.attentions.0.v_lin.weight", "module.attentions.0.v_lin.b
ias", "module.attentions.0.out_lin.weight", "module.attentions.0.out_lin.bias", "module.attentions.1.q_lin.weight", "module.attentions.
1.q_lin.bias", "module.attentions.1.k_lin.weight", "module.attentions.1.k_lin.bias", "module.attentions.1.v_lin.weight", "module.attent
ions.1.v_lin.bias", "module.attentions.1.out_lin.weight", "module.attentions.1.out_lin.bias", "module.attentions.2.q_lin.weight", "modu
le.attentions.2.q_lin.bias", "module.attentions.2.k_lin.weight", "module.attentions.2.k_lin.bias", "module.attentions.2.v_lin.weight", 
"module.attentions.2.v_lin.bias", "module.attentions.2.out_lin.weight", "module.attentions.2.out_lin.bias", "module.attentions.3.q_lin.
weight", "module.attentions.3.q_lin.bias", "module.attentions.3.k_lin.weight", "module.attentions.3.k_lin.bias", "module.attentions.3.v
_lin.weight", "module.attentions.3.v_lin.bias", "module.attentions.3.out_lin.weight", "module.attentions.3.out_lin.bias", "module.atten
tions.4.q_lin.weight", "module.attentions.4.q_lin.bias", "module.attentions.4.k_lin.weight", "module.attentions.4.k_lin.bias", "module.
attentions.4.v_lin.weight", "module.attentions.4.v_lin.bias", "module.attentions.4.out_lin.weight", "module.attentions.4.out_lin.bias",
 "module.attentions.5.q_lin.weight", "module.attentions.5.q_lin.bias", "module.attentions.5.k_lin.weight", "module.attentions.5.k_lin.b
ias", "module.attentions.5.v_lin.weight", "module.attentions.5.v_lin.bias", "module.attentions.5.out_lin.weight", "module.attentions.5.
out_lin.bias", "module.layer_norm1.0.weight", "module.layer_norm1.0.bias", "module.layer_norm1.1.weight", "module.layer_norm1.1.bias", 
"module.layer_norm1.2.weight", "module.layer_norm1.2.bias", "module.layer_norm1.3.weight", "module.layer_norm1.3.bias", "module.layer_n
orm1.4.weight", "module.layer_norm1.4.bias", "module.layer_norm1.5.weight", "module.layer_norm1.5.bias", "module.ffns.0.lin1.weight", "
module.ffns.0.lin1.bias", "module.ffns.0.lin2.weight", "module.ffns.0.lin2.bias", "module.ffns.1.lin1.weight", "module.ffns.1.lin1.bias
", "module.ffns.1.lin2.weight", "module.ffns.1.lin2.bias", "module.ffns.2.lin1.weight", "module.ffns.2.lin1.bias", "module.ffns.2.lin2.
weight", "module.ffns.2.lin2.bias", "module.ffns.3.lin1.weight", "module.ffns.3.lin1.bias", "module.ffns.3.lin2.weight", "module.ffns.3
.lin2.bias", "module.ffns.4.lin1.weight", "module.ffns.4.lin1.bias", "module.ffns.4.lin2.weight", "module.ffns.4.lin2.bias", "module.ff
ns.5.lin1.weight", "module.ffns.5.lin1.bias", "module.ffns.5.lin2.weight", "module.ffns.5.lin2.bias", "module.layer_norm2.0.weight", "m
odule.layer_norm2.0.bias", "module.layer_norm2.1.weight", "module.layer_norm2.1.bias", "module.layer_norm2.2.weight", "module.layer_nor
m2.2.bias", "module.layer_norm2.3.weight", "module.layer_norm2.3.bias", "module.layer_norm2.4.weight", "module.layer_norm2.4.bias", "mo
dule.layer_norm2.5.weight", "module.layer_norm2.5.bias", "module.pred_layer.proj.weight", "module.pred_layer.proj.bias". 

And more information regarding my setup: I was training on multiple gpus. I changed the line to encoder.load_state_dict(enc_reload, strict=False) in https://github.com/facebookresearch/XLM/blob/51886419f947d58aa02a19e6215e60bc1107b835/src/model/__init__.py#L164 I changed the line to if params.n_langs > 1 and self.use_lang_emb and self.is_decoder: in https://github.com/facebookresearch/XLM/blob/51886419f947d58aa02a19e6215e60bc1107b835/src/model/transformer.py#L279 and also this line to if langs is not None and self.use_lang_emb and self.is_decoder: in https://github.com/facebookresearch/XLM/blob/51886419f947d58aa02a19e6215e60bc1107b835/src/model/transformer.py#L385

Any help is appreciated!

sylyoung commented 3 years ago

Solved with removing "module." in the model keys strings.