chaidiscovery / chai-lab

Chai-1, SOTA model for biomolecular structure prediction
https://www.chaidiscovery.com
Other
1.24k stars 156 forks source link

Manually loading tokenizer for 'facebook/esm2_t36_3B_UR50D' from HuggingFace #29

Closed Zhengyuchuan314 closed 2 months ago

Zhengyuchuan314 commented 2 months ago

image

wjk1214 commented 2 months ago

You can download the contents of this folder and place them in the corresponding folder

biochristmas commented 2 months ago

Is this problem solved, I have the same problem when I run it, I downloaded these files “esm2_t12_35M_UR50D-contact-regression.pt esm2_t33_650M_UR50D-contact-regression.pt esm2_t48_15B_UR50D-contact-regression.pt
esmfold_3B_v0.pt esm2_t12_35M_UR50D.pt
esm2_t33_650M_UR50D.pt
esm2_t48_15B_UR50D.pt
esmfold_3B_v1.pt esm2_t30_150M_UR50D-contact-regression.pt
esm2_t36_3B_UR50D-contact-regression.pt
esm2_t6_8M_UR50D-contact-regression.pt esm2_t30_150M_UR50D.pt
esm2_t36_3B_UR50D.pt
esm2_t6_8M_UR50D.pt” and placed them in the folder named ‘facebook’ under the chai-lab folder, however, when I re-run it again, I still encountered the same error. Thanks in advance!

wjk1214 commented 2 months ago

You can download the files provided by this website https://huggingface.co/facebook/esm2_t36_3B_UR50D/tree/main Download all the files and place them in a folder named 'esm2_t36_3B_UR50D'. Then, you need to modify the folder path of model_name in the file located at /chai_lab/data/dataset/embeddings/esm.py.

arogozhnikov commented 2 months ago

You may need to download those weights manually, as @wjk1214 suggested. This is how it looks like when it's auto-downloaded (note path: ~/.cache/huggingface/hub/models--facebook--esm2_t36_3B_UR50D/ and symlinks going to actual blobs):

# tree ~/.cache/huggingface/hub/models--facebook--esm2_t36_3B_UR50D/
/root/.cache/huggingface/hub/models--facebook--esm2_t36_3B_UR50D/
├── blobs
│   ├── 0f971f11c449d21422aa982b791619c10351972992c735f4c3cd43fe09790412
│   ├── 3f0d47e841e1cb75257aeaf76d156802899a217e
│   ├── 5918b3dec9d885ee264f6c2df5291ca4dba5d4ad
│   ├── 69e7563923f87d2d7439bfb83e5a19b44b46d71b
│   ├── 6b946952cc35537226f07fd70957ee2f848880d2
│   ├── 7560b46fc383c691fb74b915b7d4bcef40d3df181447f16ba4b298845e308d0c
│   └── ba0f9b53dbbf27934f7555e5d31e37bdea9317f1
├── refs
│   └── main
└── snapshots
    └── 476b639933c8baad5ad09a60ac1a87f987b656fc
        ├── config.json -> ../../blobs/69e7563923f87d2d7439bfb83e5a19b44b46d71b
        ├── pytorch_model-00001-of-00002.bin -> ../../blobs/0f971f11c449d21422aa982b791619c10351972992c735f4c3cd43fe09790412
        ├── pytorch_model-00002-of-00002.bin -> ../../blobs/7560b46fc383c691fb74b915b7d4bcef40d3df181447f16ba4b298845e308d0c
        ├── pytorch_model.bin.index.json -> ../../blobs/5918b3dec9d885ee264f6c2df5291ca4dba5d4ad
        ├── special_tokens_map.json -> ../../blobs/ba0f9b53dbbf27934f7555e5d31e37bdea9317f1
        ├── tokenizer_config.json -> ../../blobs/3f0d47e841e1cb75257aeaf76d156802899a217e
        └── vocab.txt -> ../../blobs/6b946952cc35537226f07fd70957ee2f848880d2

Also note that ESM is quite large (12GB).

If above still doesn't help, I recommend asking HuggingFace community

biochristmas commented 2 months ago

Thank you very much for your assistance. Based on your suggestion, I downloaded the model file and updated the model path in esm.py to an absolute path. Everything is now running smoothly.

Zhengyuchuan314 commented 2 months ago

I manually downloaded the ESM model and modified the model path in esm.py. The model now runs smoothly. Thanks a lot for the help :)

arogozhnikov commented 1 month ago

Pay attention: in #61 esm location was changed to /downloads/esm to simplify cases like this. You can also change location of downloads folder with environment variable.

Users with connection problems will need to move downloaded ESM weights manually to this new location. E.g. for me:

$ ls ./chai-lab/downloads/esm/
models--facebook--esm2_t36_3B_UR50D   # this is auto-downloaded unless you have connection problems
JXJJDLK commented 1 week ago

1.First, I downloaded all the files from the model page: https://huggingface.co/facebook/esm2_t36_3B_UR50D/tree/main.

2.I created a new folder named "esm2_t36_3B_UR50D" and placed all the downloaded files into this folder.

3.I found the esm.py file and located the following lines: model_name = "facebook/esm2_t36_3B_UR50D" tokenizer = EsmTokenizer.from_pretrained(model_name, cache_dir=)

4.I replaced cache_dir= with the absolute path to the folder I created.

Is this the correct procedure? Did I make any mistakes in any of the steps?

wewewexiao2008 commented 4 days ago

@JXJJDLK

3.I found the esm.py file and located the following lines: model_name = "facebook/esm2_t36_3B_UR50D" tokenizer = EsmTokenizer.from_pretrained(model_name, cache_dir=) 4.I replaced cache_dir= with the absolute path to the folder I created.

In my case:

  1. Please check if you have two chai-1 versions installed:
    • One via pip in site-packages
    • One from cloned GitHub repo

If both versions are available, sometimes only modifying the pip installed version will take effect. Or you can just uninstall the pip version.

  1. Update the model_name to use absolute path instead:
    
    # Change this:
    model_name = "facebook/esm2_t36_3B_UR50D"

To:

model_name = "/path/to/chai-lab/downloads/esm/facebook/esm2_t36_3B_UR50D"



Do not use cache_dir parameter. The absolute path is recommended.