facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.16k stars 627 forks source link

fair-esm 1.0.3 does not provide the extra 'esmfold' #460

Open thyol opened 1 year ago

thyol commented 1 year ago

NOTE: if this is not a bug report, please use the GitHub Discussions for support questions (How do I do X?), feature requests, ideas, showcasing new applications, etc.

Bug description Please enter a clear and concise description of what the bug is. On pip install "fair-esm[esmfold]" this Warning appears:

WARNING: fair-esm 1.0.3 does not provide the extra 'esmfold' and the esmfold model is not available

Reproduction steps

python -V
pip install "fair-esm[esmfold]"`
pip install 'dllogger @ git+https://github.com/NVIDIA/dllogger.git'
pip install 'openfold @ git+https://github.com/aqlaboratory/openfold.git@4b41059694619831a7db195b7e0988fc4ff3a307'
python3
Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import esm
>>> model = esm.pretrained.esmfold_v1()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'esm.pretrained' has no attribute 'esmfold_v1'

Expected behavior esmfold_v1 is available.

Logs Please paste the command line output:

pip install "fair-esm[esmfold]"
Collecting fair-esm[esmfold]
  Using cached fair_esm-2.0.0-py3-none-any.whl (93 kB)
Collecting biopython
  Using cached biopython-1.80-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
Collecting einops
  Using cached einops-0.6.0-py3-none-any.whl (41 kB)
Collecting omegaconf
  Using cached omegaconf-2.3.0-py3-none-any.whl (79 kB)
Collecting dm-tree
  Using cached dm_tree-0.1.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (152 kB)
Collecting deepspeed==0.5.9
  Using cached deepspeed-0.5.9.tar.gz (510 kB)
  Preparing metadata (setup.py) ... done
Collecting pytorch-lightning
  Using cached pytorch_lightning-1.9.0-py3-none-any.whl (825 kB)
Collecting scipy
  Using cached scipy-1.10.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.4 MB)
Collecting ml-collections
  Using cached ml_collections-0.1.1.tar.gz (77 kB)
  Preparing metadata (setup.py) ... done
Collecting hjson
  Using cached hjson-3.1.0-py3-none-any.whl (54 kB)
Collecting ninja
  Using cached ninja-1.11.1-py2.py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (145 kB)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from deepspeed==0.5.9->fair-esm[esmfold]) (1.24.1)
Collecting packaging
  Using cached packaging-23.0-py3-none-any.whl (42 kB)
Collecting psutil
  Using cached psutil-5.9.4-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (280 kB)
Collecting py-cpuinfo
  Using cached py_cpuinfo-9.0.0-py3-none-any.whl (22 kB)
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from deepspeed==0.5.9->fair-esm[esmfold]) (1.13.1)
Collecting tqdm
  Using cached tqdm-4.64.1-py2.py3-none-any.whl (78 kB)
INFO: pip is looking at multiple versions of fair-esm[esmfold] to determine which version is compatible with other requirements. This could take a while.
Collecting fair-esm[esmfold]
  Using cached fair_esm-1.0.3-py3-none-any.whl (76 kB)
WARNING: fair-esm 1.0.3 does not provide the extra 'esmfold'
Installing collected packages: fair-esm
Successfully installed fair-esm-1.0.3
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

Additional context Add any other context about the problem here. (like proxy settings, network setup, overall goals, etc.)

Abhishaike commented 1 year ago

Having this error too!

Abhishaike commented 1 year ago

Seems like this is a python 3.10 issue

ESMFold requires deepspeed, and deepspeed requires triton, and triton is pinned at 1.0.0 in deepspeed, but triton doesnt support wheels for python 3.10 for 1.0.0....

tomsercu commented 1 year ago

Thanks for folllowing up! that makes sense, and the OP mentions python 3.0 which is very old by now and probably also doesn't have a triton wheel available. We just mention python <= 3.9 on the README but should probably not be older than 3.7

Abhishaike commented 1 year ago

I'm not familiar with deepspeed or triton, would it be a bad idea to make a PR to deepspeed to bump the triton version to 2.0.0dev or just >=1.0.0? Seems deepspeed is fine with both, it's just pinned at 1.0.0 in some places: https://github.com/microsoft/DeepSpeed/search?q=triton

Screen Shot 2023-01-30 at 12 49 10
thyol commented 1 year ago

@tomsercu Python 3.0 was typo. I removed that comment, and just left the correct python -V output.

kchennen commented 1 year ago

Hi, I have a similar error when I tried to use esmfold for a sequence. Can any one help please?

Log: [2662:9] > python esmfold.py -i data/2_interim/CEGAL/CEGAL.fasta -o data/2_interim/CEGAL/ESM 23/02/17 15:41:09 | INFO | root | Reading sequences from data/2_interim/CEGAL/CEGAL.fasta 23/02/17 15:41:09 | INFO | root | Loaded 63 sequences from data/2_interim/CEGAL/CEGAL.fasta 23/02/17 15:41:09 | INFO | root | Loading model Traceback (most recent call last): File "esmfold.py", line 136, in model = esm.pretrained.esmfold_v1() File "/biolo/dev_toolkit/python/anaconda3/envs/Metrics3D/lib/python3.7/site-packages/esm/pretrained.py", line 419, in esmfold_v1 import esm.esmfold.v1.pretrained File "/biolo/dev_toolkit/python/anaconda3/envs/Metrics3D/lib/python3.7/site-packages/esm/esmfold/v1/pretrained.py", line 5, in from esm.esmfold.v1.esmfold import ESMFold File "/biolo/dev_toolkit/python/anaconda3/envs/Metrics3D/lib/python3.7/site-packages/esm/esmfold/v1/esmfold.py", line 11, in from openfold.data.data_transforms import make_atom14_masks ModuleNotFoundError: No module named 'openfold'

Abhishaike commented 1 year ago

Hi, I have a similar error when I tried to use esmfold for a sequence. Can any one help please?

Log: [2662:9] > python esmfold.py -i data/2_interim/CEGAL/CEGAL.fasta -o data/2_interim/CEGAL/ESM 23/02/17 15:41:09 | INFO | root | Reading sequences from data/2_interim/CEGAL/CEGAL.fasta 23/02/17 15:41:09 | INFO | root | Loaded 63 sequences from data/2_interim/CEGAL/CEGAL.fasta 23/02/17 15:41:09 | INFO | root | Loading model Traceback (most recent call last): File "esmfold.py", line 136, in model = esm.pretrained.esmfold_v1() File "/biolo/dev_toolkit/python/anaconda3/envs/Metrics3D/lib/python3.7/site-packages/esm/pretrained.py", line 419, in esmfold_v1 import esm.esmfold.v1.pretrained File "/biolo/dev_toolkit/python/anaconda3/envs/Metrics3D/lib/python3.7/site-packages/esm/esmfold/v1/pretrained.py", line 5, in from esm.esmfold.v1.esmfold import ESMFold File "/biolo/dev_toolkit/python/anaconda3/envs/Metrics3D/lib/python3.7/site-packages/esm/esmfold/v1/esmfold.py", line 11, in from openfold.data.data_transforms import make_atom14_masks ModuleNotFoundError: No module named 'openfold'

This error basically means 'you didnt install esmfold'

tomsercu commented 1 year ago

A good way to sidestep trouble with installing openfold and its dependencies, is to use ESMFold via huggingface transformers: https://huggingface.co/docs/transformers/installation

transformers is a much bigger library but the wonderful folks at 🤗 did extra work to remove the openfold dependency by extracting a minimal amount of openfold into transformers itself. It will sidestep all these install issues.

tomsercu commented 1 year ago

Also ColabFold may work for some: https://github.com/sokrypton/ColabFold provides a notebook that is welltested on google colab and takes care of all dependencies there. Google Colab comes with compute limitations and time-outs though.

Abhishaike commented 1 year ago

FOR ANYBODY STUMBLING ACROSS THIS THREAD, HERE'S A (HACKY) FIX IF YOU DON'T NEED DEEPSPEED

A good way to sidestep trouble with installing openfold and its dependencies, is to use ESMFold via huggingface transformers: https://huggingface.co/docs/transformers/installation

transformers is a much bigger library but the wonderful folks at 🤗 did extra work to remove the openfold dependency by extracting a minimal amount of openfold into transformers itself. It will sidestep all these install issues.

We looked at this and while it seems like a fantastic implementation, it doesn't explicitly support multimers via linkers. While manually adding in the poly-glycine linker ourselves + removing them in the pdb file is simple enough, we'd ideally like the interface that ESM already supports out-of-the-box. The ESM batching process is also nice, and it's not clear whether 🤗 supports that.

Also ColabFold may work for some: https://github.com/sokrypton/ColabFold

ESMFold still seems to be in beta here, but this is also a good option.

As for the fix, simply don't install deepspeed, it works perfectly fine w/o it. As in...

pip install dm-tree omegaconf ml-collections einops
pip install fair-esm[esmfold]==2.0.0  --no-dependencies # Override deepspeed==0.5 
pip install 'dllogger @ git+https://github.com/NVIDIA/dllogger.git'
pip install 'openfold @ git+https://github.com/aqlaboratory/openfold.git@4b41059694619831a7db195b7e0988fc4ff3a307'

Deepspeed seems to be an optional dependency given that Openfold only imports if it exists in your environment, and continues as normal if it doesn't find it: https://github.com/aqlaboratory/openfold/search?q=deepspeed.

Not a useful solution if you're using Deepspeed elsewhere in your codebase, but works great if you need ESMFold and nothing else.

Huge shout-out to @JacobHayes for figuring this out!

Baldwin-disso commented 1 year ago

Hi,

I have the same error, I didn't find a workaround for this :

Any other workaround ?

johnnytam100 commented 1 year ago

Hi, same issue here.