Bug while decoding a trained es-en translation model

facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

MIT License

30.38k stars 6.4k forks source link

Bug while decoding a trained es-en translation model #1949

Closed darsh10 closed 4 years ago

darsh10 commented 4 years ago

This is the error that I get on using the fairseq-generate command on a trained model

load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for FConvModel: Unexpected key(s) in state_dict: "decoder.convolutions.0._linearized_weight", "decoder.convolutions.1._linearized_weight", "decoder.convolutions.2._linearized_weight".

Surprisingly, the translation and generate work for en-es pair and not reverse.

erip commented 4 years ago

Can you please include your code?

darsh10 commented 4 years ago

airseq-preprocess --source-lang es --target-lang en --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test --destdir data-bin/blah.es-en

fairseq-train data-bin/blah.es-en/ --lr 0.25 --clip-norm 0.1 --dropout 0.2 --max-tokens 4000 --arch fconv_iwslt_de_en --save-dir checkpoints/fconv--max-source-positions 44 --max-target-positions 44 --skip-invalid-size-inputs-valid-test --max-sentences 24 --max-sentences-valid 24 --eval-bleu

fairseq-generate data-bin/blah.es-en --path checkpoints/fconv/checkpoint_best.pt --batch-size 128 --beam 5 --skip-invalid-size-inputs-valid-test

erip commented 4 years ago

Seems like a duplicate of #1903

darsh10 commented 4 years ago

Well, that is unclear. Since the reverse direction. en-es works perfectly. (With the identical commands , with obvious changes)

jiangfeng1124 commented 4 years ago

Seems this is caused by a mismatch between the state_dict(s) in the pre-trained FConv models and the initialized one during inference.

So in fairseq/fairseq/modules/linearized_convolution.py, _linearized_weight is initialized to None, which will not be included in the model's state_dict (see task.build_model(...) in checkpoint_utils.py). That probably causes the "lost key error" when loading from a saved checkpoint.

A simple fix should be setting "strict=False" when calling load_state_dict, but any better solutions?

darsh10 commented 4 years ago

Thank you very much for this great fix @jiangfeng1124 👍

myleott commented 4 years ago

Fixed by b2ee110c853c5effdd8d21f50a8437485bafb285