facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
https://parl.ai
MIT License
10.49k stars 2.09k forks source link

RuntimeError: storage has wrong size when loading transformer_generator/model #4171

Closed ilyalasy closed 2 years ago

ilyalasy commented 2 years ago

Bug description Experiencing RuntimeError when finetuning model according to command described in Recipes

Reproduction steps

  1. Run command (only difference from script on Recipes page is single bst task): parlai train_model -t blended_skill_talk -m transformer/generator --init-model zoo:tutorial_transformer_generator/model --dict-file zoo:tutorial_transformer_generator/model.dict --embedding-size 512 --n-layers 8 --ffn-size 2048 --dropout 0.1 --n-heads 16 --learn-positional-embeddings True --n-positions 512 --variant xlm --activation gelu --fp16 True --text-truncate 512 --label-truncate 128 --dict-tokenizer bpe --dict-lower True -lr 1e-06 --optimizer adamax --lr-scheduler reduceonplateau --gradient-clip 0.1 -veps 0.25 --betas 0.9,0.999 --update-freq 1 --attention-dropout 0.0 --relu-dropout 0.0 --skip-generation True -vp 15 -stim 60 -vme 20000 -bs 16 -vmt ppl -vmm min --save-after-valid True --model-file /tmp/test_train_90M

Expected behavior Start of model finetuning

Logs

20:54:53 | building dictionary first...
20:54:53 | No model with opt yet at: /tmp/test_train_90M(.opt)
20:54:53 | Using CUDA
20:54:53 | loading dictionary from .../ParlAI/data/models/tutorial_transformer_generator/model.dict
20:54:53 | num words = 23928
20:54:53 | DEPRECATED: XLM should only be used for backwards compatibility, as it involves a less-stable layernorm operation.
20:54:55 | Total parameters: 71,628,800 (71,628,800 trainable)
20:54:55 | Loading existing model params from .../ParlAI/data/models/tutorial_transformer_generator/model
Traceback (most recent call last):
  File "/home/ilya/miniconda3/envs/dl/bin/parlai", line 33, in <module>
    sys.exit(load_entry_point('parlai', 'console_scripts', 'parlai')())
  File "/home/ilya/repos/ParlAI/parlai/__main__.py", line 14, in main
    superscript_main()
  File "/home/ilya/repos/ParlAI/parlai/core/script.py", line 325, in superscript_main
    return SCRIPT_REGISTRY[cmd].klass._run_from_parser_and_opt(opt, parser)
  File "/home/ilya/repos/ParlAI/parlai/core/script.py", line 108, in _run_from_parser_and_opt
    return script.run()
  File "/home/ilya/repos/ParlAI/parlai/scripts/train_model.py", line 997, in run
    self.train_loop = TrainLoop(self.opt)
  File "/home/ilya/repos/ParlAI/parlai/scripts/train_model.py", line 353, in __init__
    self.agent = create_agent(opt)
  File "/home/ilya/repos/ParlAI/parlai/core/agents.py", line 479, in create_agent
    model = model_class(opt)
  File "/home/ilya/repos/ParlAI/parlai/core/torch_generator_agent.py", line 516, in __init__
    states = self.load(init_model)
  File "/home/ilya/repos/ParlAI/parlai/core/torch_agent.py", line 2074, in load
    states = torch.load(
  File "/home/ilya/miniconda3/envs/dl/lib/python3.8/site-packages/torch/serialization.py", line 608, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/ilya/miniconda3/envs/dl/lib/python3.8/site-packages/torch/serialization.py", line 794, in _legacy_load
    deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: storage has wrong size: expected 17592188352039 got 1048576

Additional context

stephenroller commented 2 years ago

Seems like a corrupt file perhaps. Try deleting the folder and retrying

ilyalasy commented 2 years ago

Seems it was a problem with untar on my system, sorry for bothering, closing.