facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.16k stars 627 forks source link

Bias of chosing amino acid for inverse folding #199

Closed vuhongai closed 2 years ago

vuhongai commented 2 years ago

Dear authors,

Thank you for sharing your amazing work. I however would like to report this phenomenon, maybe a bug, maybe not.

When I do the inverse folding with defined backbone, I would like to fix (seed) about 50-100 amino acids at the begining and it results in designed sequence dominated with leucine (L), regardless the tempature (I chose maximum temperature is 1, if temp>1, this bias will be much less). Did you obseve this bias before in your design?

Best regards, Ai

tomsercu commented 2 years ago

Thanks for sharing this failure mode. This would of course depend on the specifics of the backbone you present to the model, as well as the fixed part of the sequence. If either of those were out of distribution or malformed (ie not corresponding to a real structure/sequence combination), I would not expect the model to produce anything reasonable.