ZikangZhou / QCNet

[CVPR 2023] Query-Centric Trajectory Prediction
https://openaccess.thecvf.com/content/CVPR2023/papers/Zhou_Query-Centric_Trajectory_Prediction_CVPR_2023_paper.pdf
Apache License 2.0
432 stars 70 forks source link

Ask about the design of the attention layer #42

Closed JYS997760473 closed 2 months ago

JYS997760473 commented 2 months ago

Hi, Dr.Zhou, I am curious about the the design of the attention layer, since these four lines look like different with the common multi-head attention: https://github.com/ZikangZhou/QCNet/blob/55cacb418cbbce3753119c1f157360e66993d0d0/layers/attention_layer.py#L96C1-L99C40 image And I would like to ask why you use dot product here rather than common matrix multiplication like:

sim = torch.matmul(q_i, k_j.transpose(1,2)) * self.scale
attn = softmax(sim_test, index, ptr)
attn = self.attn_drop(attn)
return torch.matmul(attn, v_j)
ares89 commented 2 months ago

It's scaled dot-product attention.