facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.16k stars 627 forks source link

No template block in ESM-Fold? #326

Closed zhenyuhe00 closed 1 year ago

zhenyuhe00 commented 1 year ago

Hi, Thanks for the open-source of ESM-Fold. In your paper, you said that attention maps are used in the template blocks. But, I read the esmfold code but find there are no template blocks. I wonder if I missed something. Thanks in advance!

pengshuang commented 1 year ago

I think the attention maps may be the replacement for the template blocks. As the original paper said,

The second change involves the removal of templates. Template information is passed to the model as pairwise distances, input to the residue-pairwise embedding. We simply omit this information, passing instead the attention maps from the language model, as these have been shown to capture structural information well.

zhenyuhe00 commented 1 year ago

But there are also no attention maps from the pretrained ESM-2 in the esmfold code.

ebetica commented 1 year ago

We used attention maps in the minimal IPA experiments, but we noticed that the large models seemed to not need the attention maps and do equally well. The attention maps add a significant memory and speed penalty so we elected to remove it in the largest models.

joshim5 commented 1 year ago

@ebetica how helpful are self-attention maps for the smaller models? For MSA Transformer, self-attention maps didn't matter either and there wasn't even a significant folding trunk image

Curious to see this table from ESMFold filled in with LM representations +- self-attention maps. image