which layer's attention maps are used as template information?

Hi, In your ESM-Fold paper: The second change involves the removal of templates. Template information is passed to the model as pairwise distances, input to the residue-pairwise embedding. We simply omit this information, passing instead the attention maps from the language model , as these have been shown to capture structural information well (38) You use attention maps as template information. I wonder it's from the last layer or from all the layers?

Thanks in advance！

facebookresearch / esm

which layer's attention maps are used as template information? #265