Hi,
In your ESM-Fold paper:
The second change involves the removal of templates. Template information is passed to the model as
pairwise distances, input to the residue-pairwise embedding. We simply omit this information, passing
instead the attention maps from the language model , as these have been shown to capture structural
information well (38)
You use attention maps as template information. I wonder it's from the last layer or from all the layers?
Hi, In your ESM-Fold paper: The second change involves the removal of templates. Template information is passed to the model as pairwise distances, input to the residue-pairwise embedding. We simply omit this information, passing instead the attention maps from the language model , as these have been shown to capture structural information well (38) You use attention maps as template information. I wonder it's from the last layer or from all the layers?
Thanks in advance!