Closed zhenyuhe00 closed 1 year ago
I think the attention maps may be the replacement for the template blocks. As the original paper said,
The second change involves the removal of templates. Template information is passed to the model as pairwise distances, input to the residue-pairwise embedding. We simply omit this information, passing instead the attention maps from the language model, as these have been shown to capture structural information well.
But there are also no attention maps from the pretrained ESM-2 in the esmfold code.
We used attention maps in the minimal IPA experiments, but we noticed that the large models seemed to not need the attention maps and do equally well. The attention maps add a significant memory and speed penalty so we elected to remove it in the largest models.
@ebetica how helpful are self-attention maps for the smaller models? For MSA Transformer, self-attention maps didn't matter either and there wasn't even a significant folding trunk
Curious to see this table from ESMFold filled in with LM representations +- self-attention maps.
Hi, Thanks for the open-source of ESM-Fold. In your paper, you said that attention maps are used in the template blocks. But, I read the esmfold code but find there are no template blocks. I wonder if I missed something. Thanks in advance!