which layer's attention maps used as template information in ESM-Fold

facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins

MIT License

3.16k stars 627 forks source link

which layer's attention maps used as template information in ESM-Fold #264

Closed zhenyuhe00 closed 1 year ago

zhenyuhe00 commented 2 years ago

Hi, In your ESM-Fold paper: The second change involves the removal of templates. Template information is passed to the model as pairwise distances, input to the residue-pairwise embedding. We simply omit this information, passing instead the attention maps from the language model , as these have been shown to capture structural information well (38) You use attention maps as template information. I wonder it's from the last layer or from all the layers?

Thanks in advance！

tomsercu commented 2 years ago

Stacked across all layers

zhenyuhe00 commented 2 years ago

Thanks！

zhenyuhe00 commented 2 years ago

Stacked across all layers

Are they directly added to the pair embeddings initialized by position embeddings or by an MLP layer?

tomsercu commented 1 year ago

Actually, I stand corrected: we experimented with stacking attention maps but didn't see a gain. Hence we removed it from the final versions of ESMFold and just initialize with a zero-tensor. Attention maps are only used for the experiments with the structure projection directly from the LM, described in Lin et al 2022 section 2.