Error when loading esm transformer

yangkky commented 3 years ago

model, alphabet = esm.pretrained.esm_msa1b_t12_100M_UR50S() results in:

Traceback (most recent call last):
  File "/home/kevyan/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/esm/pretrained.py", line 27, in load_hub_workaround
    data = torch.hub.load_state_dict_from_url(url, progress=False, map_location='cpu')
  File "/home/kevyan/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/torch/hub.py", line 504, in load_state_dict_from_url
    raise RuntimeError('Only one file(not dir) is allowed in the zipfile')
RuntimeError: Only one file(not dir) is allowed in the zipfile
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/kevyan/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-15-00c0cd3f832b>", line 1, in <module>
    model, alphabet = esm.pretrained.esm_msa1b_t12_100M_UR50S()
  File "/home/kevyan/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/esm/pretrained.py", line 191, in esm_msa1b_t12_100M_UR50S
    return load_model_and_alphabet_hub("esm_msa1b_t12_100M_UR50S")
  File "/home/kevyan/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/esm/pretrained.py", line 47, in load_model_and_alphabet_hub
    model_data = load_hub_workaround(url)
  File "/home/kevyan/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/esm/pretrained.py", line 32, in load_hub_workaround
    f"{torch.hub.get_dir()}/checkpoints/{fn}",
AttributeError: module 'torch.hub' has no attribute 'get_dir'

As a workaround, I tried downloading the weights directly and loading them: model, alphabet = load_model_and_alphabet('/home/kevyan/.cache/torch/checkpoints/esm_msa1b_t12_100M_UR50S.pt')

Traceback (most recent call last):
  File "/home/kevyan/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-16-75482e322f76>", line 1, in <module>
    model, alphabet = load_model_and_alphabet('/home/kevyan/.cache/torch/checkpoints/esm_msa1b_t12_100M_UR50S.pt')
  File "/home/kevyan/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/esm/pretrained.py", line 21, in load_model_and_alphabet
    return load_model_and_alphabet_local(model_name)
  File "/home/kevyan/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/esm/pretrained.py", line 57, in load_model_and_alphabet_local
    if _has_regression_weights(model_name):
NameError: name 'model_name' is not defined

wangleiofficial commented 3 years ago

You must have PyTorch 1.5 or later installed to use this repository. Or when you download the weight file to the local system, load_model_and_alphabet_local(your_path)

yangkky commented 3 years ago

@wangleiofficial

You must have PyTorch 1.5 or later installed to use this repository.

I'm using pytorch 1.5.1

Or when you download the weight file to the local system, load_model_and_alphabet_local(your_path)

I already tried that, as stated in the original issue.

yangkky commented 3 years ago

I cloned the repo and pip installed it locally, and now I get a different error when loading pre-downloaded weights:

model, alphabet = load_model_and_alphabet('/home/kevyan/.cache/torch/checkpoints/esm_msa1b_t12_100M_UR50S.pt')

  File "/home/kevyan/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-7-75482e322f76>", line 1, in <module>
    model, alphabet = load_model_and_alphabet('/home/kevyan/.cache/torch/checkpoints/esm_msa1b_t12_100M_UR50S.pt')
  File "/home/kevyan/workspace/src/esm/esm/pretrained.py", line 22, in load_model_and_alphabet
    return load_model_and_alphabet_local(model_name)
  File "/home/kevyan/workspace/src/esm/esm/pretrained.py", line 61, in load_model_and_alphabet_local
    model_data = torch.load(model_location, map_location="cpu")
  File "/home/kevyan/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/torch/serialization.py", line 586, in load
    with _open_zipfile_reader(f) as opened_zipfile:
  File "/home/kevyan/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/torch/serialization.py", line 246, in __init__
    super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
AttributeError: 'PosixPath' object has no attribute 'tell'

Strangely, sd = torch.load('/home/kevyan/.cache/torch/checkpoints/esm_msa1b_t12_100M_UR50S.pt', map_location='cpu') works.

tomsercu commented 3 years ago

Hi Kevin, so on the original post:

you're running into this torch.hub bug first, which is fine, it's what load_hub_workaround is for.
But the 2nd exception AttributeError: module 'torch.hub' has no attribute 'get_dir' is weird and indicates something is messed up with your pytorch env; torch.hub.get_dir has been stable for >1year

The workaround, NameError: name 'model_name' is not defined is fixed on esm master, will become 0.4.1 soon

In your follow-up post, AttributeError: 'PosixPath' object has no attribute 'tell' -- looks like pytorch tries to use the path string as filehandler, but torch.load(fn) has been supported for a long time as well like your test indicates. So again points to your pytorch env being the culprit, sounds like it's somehow picking up a pytorch from the stone age?

yangkky commented 3 years ago

If I download the checkpoint for esm-1b, that one loads just fine in the same pytorch environment though?

encoder, alphabet = load_model_and_alphabet("/home/kevyan/.cache/torch/checkpoints/esm1b_t33_650M_UR50S.pt")

yangkky commented 3 years ago

It works on pytorch 1.9.0. Maybe 1.5.1 is just not compatible with the pytorch version used to save esm-msa1b? https://github.com/deepset-ai/haystack/issues/589

tomsercu commented 3 years ago

Hmmm interesting - yes maybe something did change about torch.load handling str fn versus Path(fn).. that's something that changed in esm update. If you still have the old pytorch1.5.1 env, could you test if this fails:

from pathlib import Path
encoder, alphabet = load_model_and_alphabet(Path("/home/kevyan/.cache/torch/checkpoints/esm1b_t33_650M_UR50S.pt"))

yangkky commented 3 years ago

That gives me

Traceback (most recent call last):
  File "/home/kevyan/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-23-6ad5cd053f1b>", line 1, in <module>
    model, alphabet = load_model_and_alphabet(Path("/home/kevyan/.cache/torch/checkpoints/esm_msa1b_t12_100M_UR50S.pt"))
  File "/home/kevyan/workspace/src/esm/esm/pretrained.py", line 21, in load_model_and_alphabet
    if model_name.endswith(".pt"):  # treat as filepath
AttributeError: 'PosixPath' object has no attribute 'endswith'

tomsercu commented 3 years ago

ah I see, nvm, conversion to Path happens in load_model_and_alphabet_local anyways so that wouldn't explain the discrepancy you see.. However it seems that for older versions of pytorch, loading a pathlib.Path object just wasn't supported and basically undefined behavior. Let me put a fix just in case this reappears for others.

tomsercu commented 3 years ago

Lmk if that solved it in pytorch 1.5.1!

yangkky commented 3 years ago

I pulled that commit, downloaded https://dl.fbaipublicfiles.com/fair-esm/regression/esm_msa1b_t12_100M_UR50S-contact-regression.pt into the same directory, and it worked! Thanks!

facebookresearch / esm

Error when loading esm transformer #109