facebookresearch / LAMA

LAnguage Model Analysis
Other
1.35k stars 184 forks source link

./download_models.sh and run_experiments.py: Torch invalid memory size - maybe an overflow? #47

Open blrtvs opened 3 years ago

blrtvs commented 3 years ago

Hi,

when I run ./download_models.sh., I get the following exception:

Building common vocab
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
Namespace(lm='transformerxl', transformerxl_model_dir='pre-trained_language_models/transformerxl/transfo-xl-wt103/')
Loading transformerxl model...
Loading Transformer XL model from pre-trained_language_models/transformerxl/transfo-xl-wt103/
Traceback (most recent call last):
  File "lama/vocab_intersection.py", line 158, in <module>
    main()
  File "lama/vocab_intersection.py", line 152, in main
    __vocab_intersection(CASED_MODELS, CASED_COMMON_VOCAB_FILENAME)
  File "lama/vocab_intersection.py", line 97, in __vocab_intersection
    model = build_model_by_name(args.lm, args)
  File "/LAMA/lama/modules/__init__.py", line 31, in build_model_by_name
    return MODEL_NAME_TO_CLASS[lm](args)
  File "/LAMA/lama/modules/transformerxl_connector.py", line 37, in __init__
    self.model = TransfoXLLMHeadModel.from_pretrained(model_name)
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 939, in from_pretrained
    model = cls(config, *inputs, **kwargs)
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 1312, in __init__
    self.transformer = TransfoXLModel(config)
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 1033, in __init__
    div_val=config.div_val)
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 780, in __init__
    self.emb_layers.append(nn.Embedding(r_idx-l_idx, d_emb_i))
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 100, in __init__
    self.weight = Parameter(torch.Tensor(num_embeddings, embedding_dim))
RuntimeError: $ Torch: invalid memory size -- maybe an overflow? at /pytorch/aten/src/TH/THGeneral.cpp:188

I tried different (newer) versions of torch, but that lead to the exact same dimension error that JXZe reports in Issue #32 :

      RuntimeError: Trying to create tensor with negative dimension -200001: [-200001, 16]

But in #32 there is no recommendation how to fix this dimension error.

All the packages from requirements.txt are installed correctly, but I have overrides==3.1.0 instead of overrides==6.1.0 as the import "from allennlp.modules.elmo import _ElmoBiLm" in elmo_connector.py didn't work, it worked only after changing to 3.1.0. I also tried to skip the building vocab-part and downloaded the provided common_vocab.txts from the README, but the same Torch: invalid memory size -- maybe an overflow?-error occurs when running run_experiments.py .

Does anybody have an idea how to fix this?

blrtvs commented 3 years ago

I can solve this by updating pytorch-pretrained-bert to transformers but that leads to some import errors, for example with allennlp. So I updated also allennlp and that worked until one is trying to run the experiments. Using transformers instead of pytorch-pretrained-bert produces many exceptions in the code due to slightly different syntax and so on. So its really an overhead. If somebody knows how to get LAMA working with the old pytorch-pretrained-bert package, let me know. I even tried to change the cuda version, but still got the overflow error from above.

Kickboxin commented 2 years ago

Okey

Zjh-819 commented 2 years ago

Hi! @blrtvs I got the solution: The reason is that the configuration file for Transformer XL had been updated in Apr 2020. It's not conpatible with those packages in the requirements.txt. Replace the config.json in transformerxl/transfo-xl-wt103 with this one, then it might work. https://huggingface.co/transfo-xl-wt103/raw/50554b1a7e440d988096dbdf0b3a0edc73470d3d/config.json

blrtvs commented 2 years ago

@Zjh-819 great! Thanks, I will try it. It would be awesome if it works :)

laurinpaech commented 2 years ago

Hi! @blrtvs I got the solution: The reason is that the configuration file for Transformer XL had been updated in Apr 2020. It's not conpatible with those packages in the requirements.txt. Replace the config.json in transformerxl/transfo-xl-wt103 with this one, then it might work. https://huggingface.co/transfo-xl-wt103/raw/50554b1a7e440d988096dbdf0b3a0edc73470d3d/config.json

Worked for me. Good job!