ant-research / Pyraformer

Apache License 2.0
252 stars 38 forks source link

Difficulties in the understanding of the constant A #7

Open WYL-Projects opened 2 years ago

WYL-Projects commented 2 years ago

Hello, Author Thanks for the high performance pyramid attention you have proposed. However, while I was reviewing the paper I came across several difficulties as follows:

  1. In the appendix of the paper, I can know that the constant A represents the number of adjacent nodes at the same scale that a node can attend to.When the constant A takes the value 3, does it represent the number of neighbor nodes at the same scale as the middle node except for the leftmost and rightmost nodes?Because I think the number of neighboring nodes of the leftmost and rightmost nodes under each scale is 2, right?
  2. When the constant A takes the value of 3, the model diagram of PAM looks like this. image When the constant A takes the value 5, it means that the number of neighboring nodes of each node at the same scale is 5. The model diagram of PAM looks like the following? image If so, then the number of neighboring nodes A of the leftmost two and rightmost two nodes of the sequence data of S=1 can only be 3 and 4, and similarly the number of neighboring nodes A of the leftmost two and rightmost two nodes of S=2 can only be 3 and 4, and the leftmost node and rightmost node A of S=3 can only be 3. But I feel my understanding of the constant A is wrong.I hope the author can give me some pointers.
Zhazhan commented 2 years ago

Hello,

Thank you for your interest in our work. Good questions.

  1. Yes, we use A to represent the number of same-scale neighbor nodes that a middle node in the sequence (or, most nodes in the sequence) can attend to. The number of same-scale neighbor nodes that can be attended to by nodes near the leftmost and rightmost in the sequence is less than A. In equations (8) and (12), we take this into account and take the upper bound A to compute complexity.

  2. Yes, the diagram of A=5 is right. Your understanding is right.