the issue of group attention mechanisms in the paper and group attention in code

Hi author, you have done a great job and I am very interested in the work you have researched. I have some doubts about the attention aspect of the paper. The group attention designed in the paper divides query, key, value into multiple groups and then computes the attention in parallel, after which each group is max-pooled and then the attention is computed again between groups. However, the code only computes attention within groups, but the maximum pooling and attention between groups seem not to be implemented, which I am a bit confused about. The code seems to calculate the group's attention and then cat the result directly. Looking forward to your answers and replies!

VincLee8188 / GMAN-PyTorch

the issue of group attention mechanisms in the paper and group attention in code #10