Closed zhenyuhe00 closed 1 year ago
Stacked across all layers
Thanks!
Stacked across all layers
Are they directly added to the pair embeddings initialized by position embeddings or by an MLP layer?
Actually, I stand corrected: we experimented with stacking attention maps but didn't see a gain. Hence we removed it from the final versions of ESMFold and just initialize with a zero-tensor. Attention maps are only used for the experiments with the structure projection directly from the LM, described in Lin et al 2022 section 2.
Hi, In your ESM-Fold paper: The second change involves the removal of templates. Template information is passed to the model as pairwise distances, input to the residue-pairwise embedding. We simply omit this information, passing instead the attention maps from the language model , as these have been shown to capture structural information well (38) You use attention maps as template information. I wonder it's from the last layer or from all the layers?
Thanks in advance!