facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.16k stars 627 forks source link

ESM atlas length limit #477

Open y-hwang opened 1 year ago

y-hwang commented 1 year ago

Thank you for developing this excellent tool! I keep getting the following error when attempting to fold sequences even when it is less than 400 AAs long: "Invalid entry. (Max length is 400)"

tomsercu commented 1 year ago

Hmm that's weird, but probably the error message is misleading; does your input contain any non-standard amino acid? eg spaces and line breaks will throw us off too.

y-hwang commented 1 year ago

Ah it turned out to be due to "X" coding for any amino acid. Is there a way to replace this with another "placeholder" aminoacid code?

tomsercu commented 1 year ago

In principle X could be allowed, the model has been pre-trained with it. You could substitute a flexible amino acid like G if it makes sense in your setting

nikitos9000 commented 1 year ago

Thanks for noticing this bug, "X" is actually allowed in the API, but not in the ESMAtlas frontend which is a bug. Please use curl -X POST --data "AAAXAAA" https://api.esmatlas.com/foldSequence/v1 in this case for now.