facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.16k stars 627 forks source link

How to use the embedding of ESM-2 when training ESM-Fold #266

Closed zhenyuhe00 closed 1 year ago

zhenyuhe00 commented 2 years ago

Hi, Congrats on your great series of work! the crop size of ESM-Fold is 384 during training. However, I wonder when conducting inference to get embeddings from ESM-2, is the sequence fed to ESM-2 also cropped to 384 or it's the full sequence? The former case may downgrade performance since the context information is cropped. Besides, I'm curious did you conduct inference offline to storage or conduct inference online when training ESMFold?

Thanks in advance!

WangHuiNEU commented 1 year ago

@ebetica That's a very interesting question. Can you answer it?

ebetica commented 1 year ago

Hey @zhenyuhe00, we crop the sequence during training. Not cropping would be prohibitively expensive in many cases.

For inference, we do it offline, and never crop. We tried using the disordered residues (provided in FASTA but not PDB) for a small overall improvement (<1 lddt)

zhenyuhe00 commented 1 year ago

thanks !