facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.42k stars 6.4k forks source link

Size mismatch when loading the pretrained GWB LM #502

Open OanaMariaCamburu opened 5 years ago

OanaMariaCamburu commented 5 years ago

Hi,

I get the following error when trying to use the GWB trained LM.

$ CUDA_VISIBLE_DEVICES=0 python eval_lm.py data-bin/my_csk/my_csk_gbw --path 'models/gbw_fconv_lm/model.pt' --output-word-probs
Namespace(cpu=False, data='data-bin/my_csk/my_csk_gbw', fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, future_target=False, gen_subset='test', log_format=None, log_interval=1000, max_sentences=None, max_tokens=None, model_overrides='{}', no_progress_bar=False, num_shards=1, output_dictionary_size=-1, output_word_probs=True, output_word_stats=False, past_target=False, path='models/gbw_fconv_lm/model.pt', quiet=False, raw_text=False, remove_bpe=None, sample_break_mode=None, seed=1, self_target=False, shard_id=0, skip_invalid_size_inputs_valid_test=False, task='language_modeling', tokens_per_sample=1024)
| dictionary: 793304 types
| loading model(s) from models/gbw_fconv_lm/model.pt
Traceback (most recent call last):
  File "eval_lm.py", line 189, in <module>
    main(args)
  File "eval_lm.py", line 58, in main
    models, args = utils.load_ensemble_for_inference(parsed_args.path.split(':'), task, model_arg_overrides=eval(parsed_args.model_overrides))
  File "/raid/data/oanuru/my_fairseq/my_fairseq/fairseq/utils.py", line 165, in load_ensemble_for_inference
    model.load_state_dict(state['model'], strict=True)
  File "/raid/data/oanuru/my_fairseq/my_fairseq/fairseq/models/fairseq_model.py", line 66, in load_state_dict
    super().load_state_dict(state_dict, strict)
  File "/data/dgx1/oanuru/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/nn/modules/module.py", line 769, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for FConvLanguageModel:
        size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([793302, 128]) from checkpoint, the shape in current model is torch.Size([793304, 128]).
        size mismatch for decoder.adaptive_softmax.tail.2.2.weight: copying a param with shape torch.Size([593302, 256]) from checkpoint, the shape in current model is torch.Size([593304, 256]).
huihuifan commented 5 years ago

@alexeib