facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.47k stars 6.41k forks source link

Unable to load conv.wmt14.en-de #4680

Open zouharvi opened 2 years ago

zouharvi commented 2 years ago

🐛 Bug

Loading conv.wmt14.en-de from the hub fails despite being listed in available pre-trained models.

To Reproduce

import torch
model = torch.hub.load(
    'pytorch/fairseq', "conv.wmt14.en-de",
    tokenizer='moses', bpe='subword_nmt',
    verbose=False,
)

Resulting output:

2022-08-31 13:50:38 | INFO | fairseq.file_utils | loading archive file https://dl.fbaipublicfiles.com/fairseq/models/wmt14.en-de.fconv-py.tar.bz2 from cache at /home/vilda/.cache/torch/pytorch_fairseq/51f4ece0b9ae004b9be1c056250c16a8d31cec221b5b13b0ad12b3b5a920528e.784d9a9ac26b0b742995e134338741d0edaee6b6b26bc680b311bcb4bbef353f
2022-08-31 13:50:40 | INFO | fairseq.tasks.translation | [en] dictionary: 42243 types
2022-08-31 13:50:40 | INFO | fairseq.tasks.translation | [de] dictionary: 43676 types
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/vilda/.local/lib/python3.10/site-packages/torch/hub.py", line 540, in load
    model = _load_local(repo_or_dir, model, *args, **kwargs)
  File "/home/vilda/.local/lib/python3.10/site-packages/torch/hub.py", line 569, in _load_local
    model = entry(*args, **kwargs)
  File "/home/vilda/.cache/torch/hub/pytorch_fairseq_main/fairseq/models/fairseq_model.py", line 267, in from_pretrained
    x = hub_utils.from_pretrained(
  File "/home/vilda/.cache/torch/hub/pytorch_fairseq_main/fairseq/hub_utils.py", line 82, in from_pretrained
    models, args, task = checkpoint_utils.load_model_ensemble_and_task(
  File "/home/vilda/.cache/torch/hub/pytorch_fairseq_main/fairseq/checkpoint_utils.py", line 482, in load_model_ensemble_and_task
    model.load_state_dict(
  File "/home/vilda/.cache/torch/hub/pytorch_fairseq_main/fairseq/models/fairseq_model.py", line 128, in load_state_dict
    return super().load_state_dict(new_state_dict, strict)
  File "/home/vilda/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1604, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for FConvModel:
    Missing key(s) in state_dict: "decoder.version". 
    size mismatch for decoder.convolutions.0.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
    size mismatch for decoder.convolutions.1.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
    size mismatch for decoder.convolutions.2.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
    size mismatch for decoder.convolutions.3.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
    size mismatch for decoder.convolutions.4.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
    size mismatch for decoder.convolutions.5.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
    size mismatch for decoder.convolutions.6.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
    size mismatch for decoder.convolutions.7.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
    size mismatch for decoder.convolutions.8.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
    size mismatch for decoder.convolutions.9.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 2048]).
    size mismatch for decoder.convolutions.10.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 2048]).
    size mismatch for decoder.convolutions.11.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 2048]).
    size mismatch for decoder.convolutions.12.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 2048]).
    size mismatch for decoder.convolutions.13.weight_g: copying a param with shape torch.Size([1, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 4096]).
    size mismatch for decoder.convolutions.14.weight_g: copying a param with shape torch.Size([1, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 4096]).

Environment

zouharvi commented 2 years ago

Same thing happens when I try to load it locally:

from fairseq.models.fconv import FConvModel

model = FConvModel.from_pretrained(
    "/home/vilda/Downloads/wmt14.en-de.fconv-py/",
    checkpoint_file="model.pt",
    bpe="subword_nmt",
    bpe_codes="bpecodes"
)

Output:

2022-08-31 14:01:24 | INFO | fairseq.file_utils | loading archive file /home/vilda/Downloads/wmt14.en-de.fconv-py/
2022-08-31 14:01:26 | INFO | fairseq.tasks.translation | [en] dictionary: 42243 types
2022-08-31 14:01:26 | INFO | fairseq.tasks.translation | [de] dictionary: 43676 types
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/vilda/.local/lib/python3.10/site-packages/fairseq/models/fairseq_model.py", line 267, in from_pretrained
    x = hub_utils.from_pretrained(
  File "/home/vilda/.local/lib/python3.10/site-packages/fairseq/hub_utils.py", line 82, in from_pretrained
    models, args, task = checkpoint_utils.load_model_ensemble_and_task(
  File "/home/vilda/.local/lib/python3.10/site-packages/fairseq/checkpoint_utils.py", line 482, in load_model_ensemble_and_task
    model.load_state_dict(
  File "/home/vilda/.local/lib/python3.10/site-packages/fairseq/models/fairseq_model.py", line 128, in load_state_dict
    return super().load_state_dict(new_state_dict, strict)
  File "/home/vilda/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1604, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for FConvModel:
    Missing key(s) in state_dict: "decoder.version". 
    size mismatch for decoder.convolutions.0.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
    size mismatch for decoder.convolutions.1.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
    size mismatch for decoder.convolutions.2.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
    size mismatch for decoder.convolutions.3.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
    size mismatch for decoder.convolutions.4.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
    size mismatch for decoder.convolutions.5.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
    size mismatch for decoder.convolutions.6.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
    size mismatch for decoder.convolutions.7.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
    size mismatch for decoder.convolutions.8.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
    size mismatch for decoder.convolutions.9.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 2048]).
    size mismatch for decoder.convolutions.10.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 2048]).
    size mismatch for decoder.convolutions.11.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 2048]).
    size mismatch for decoder.convolutions.12.weight_g: copying a param with shape torch.Size([3, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 2048]).
    size mismatch for decoder.convolutions.13.weight_g: copying a param with shape torch.Size([1, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 4096]).
    size mismatch for decoder.convolutions.14.weight_g: copying a param with shape torch.Size([1, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 4096]).

Loading conv.wmt17.en-de works without any issues.