About huggingface support for esm

facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins

MIT License

3.16k stars 627 forks source link

About huggingface support for esm #198

Closed chang-github-00 closed 1 year ago

chang-github-00 commented 2 years ago

I see a new model card on huggingface https://huggingface.co/facebook. However, I can't successfully import this model because the released transformers don't support esm models.

from transformers import ESMForMaskedLM, ESMTokenizer
tokenizer = ESMTokenizer.from_pretrained("facebook/esm-1b", do_lower_case=False )
model = ESMForMaskedLM.from_pretrained("facebook/esm-1b")
sequence_Example = "QERLKSIVRILE"
encoded_input = tokenizer(sequence_Example, return_tensors='pt')
output = model(**encoded_input)

I wonder when will this version be fully released. Many thanks!

chang-github-00 commented 2 years ago

This seems to be solved by

git clone -b add_esm-proper --single-branch https://github.com/liujas000/transformers.git 
pip -q install ./transformers

tomsercu commented 2 years ago

Yes see also #158 -- we dropped the ball a bit on finalizing this integration, will look into it soon

dashapyly commented 2 years ago

yes getting this to work would be nice - thank you

jlotthammer commented 2 years ago

+1 a long term solution here would be very great!

felixgabler commented 1 year ago

This seems to be solved by

git clone -b add_esm-proper --single-branch https://github.com/liujas000/transformers.git 
pip -q install ./transformers

While this generally works, I noticed that the tokenizer uses 1 (= "-") instead of 2 (= "") for end-of-sequence padding as described here https://github.com/facebookresearch/esm/discussions/126. Do you think this could be an issue?

tomsercu commented 1 year ago

Very happy to share that - thanks to the amazing work of @Rocketknight1 and 🤗 team - this is now supported:

Classification tasks with proteins, just like BERT: https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/protein_language_modeling.ipynb

Fold proteins in Colab or your local GPU and export PDB files: https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/protein_folding.ipynb