Structural adapter query vector questions and decoding

simonlevine commented 1 year ago

Hi, I have a few questions:

For the structural encoding, did you use pair as well as node features, or just one of the two?
I didn't see any word about scaling the dimensions of the query vector (structural encodings) to match the adapter. Did you use a dense layer to accomplish this? Did you use more parameters for the K and V vectors?
If you used both pair and node features, how did you combine them into a single query vector? Was there a separate cross attention (to effectively make this hierarchical)?
Why the choice for the structural encoding as the query and not the key or value vectors?
Did you apply RoPE to the key and value vectors (from the LM sequence embedding), or just to the structural encoding query vector? The paper reads like you did it for K,V and Q, but the LM vectors already should have positional information from earlier in the stack (unless you aren't using the embedding anymore, but it reads like you're discretizing with each decoding step).

Thank you very much.

zhengzx-nlp commented 1 year ago

Hi Simon,

Sorry I didn't get notified by github regarding your comment! I would answer your questions below:

We intend to design our framework to be agnostic to specific structure encoding parameterization choices. We thus used established and opensource SoTA protein structure models, such as ProteinMPNN, PiFold, and ESM-IF's GVPTransformerEncoder. In these models, protein structure/graph representation is built using both node and edge (pair) features. And we simply used the protein structure's final "node" representation as the structural representation to feed to the pLM decoder.
Yes, dimension matching is needed, which is accomplished by the linear projectors for K, V in the Adapter's self-attention module, where the query is the pLM hidden state and key/value is the structure encoding. The (structural) key and value (e.g., 256) are scaled up to match the query dimension (e.g., 768) of the pLM.
As mentioned in Q1, we only used node representations that are supposed to have properly encoded protein structural/spatial hierarchy.
Actually structural encoding is the key/value, and pLM hidden is the query. I am gonna check if something confusing I might have made in the manuscript. Thank you for bringing this to our attention!
We applied RoPE to query (of LM) and key (of structural encoding). The RoPE was used in order to explicitly consider the relative positions of residues, thus better understanding of their spatial relationship. This was found to be helpful.

I hope I was able to help address your questions! Please feel free to add additional comments if you have any further questions.

Best, Zaixiang

simonlevine commented 1 year ago

Hi @zhengzx-nlp, thanks for your response. That all makes sense, and yes, as to 4., there is a section on page 15 of the manuscript that reads "...the structural adapter composes a multihead attention (MULTIHEAD ATTN) that queries structure information from the structure encoder...". This would seem to imply Q ~ Structure, rather than pLM.

BytedProtein / ByProt

Structural adapter query vector questions and decoding #1