ZikangZhou / HiVT

[CVPR 2022] HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction
https://openaccess.thecvf.com/content/CVPR2022/papers/Zhou_HiVT_Hierarchical_Vector_Transformer_for_Multi-Agent_Motion_Prediction_CVPR_2022_paper.pdf
Apache License 2.0
577 stars 115 forks source link

A question about AAEncoder #29

Closed Elnath-123 closed 1 year ago

Elnath-123 commented 1 year ago

Thanks for contributing such amazing work! Just a question, when we compute the cross-attention for the center agent and its neighbor agents, why do we index the edge_index[1] as rotate_mat for x_j (the neighbor agents) rather than edge_index[0]? As far as I know, the edge_index[0] represents the source, i.e., the center agent, and the edge_index[1] represents the target, i.e., the neighbor agents. Here we want to rotate the neighbor agents according to the center agent angles \theta. Thus, I think rotate_mat[edge_index[0]] is the rotate_mat parametrized by the center agent angle \theta, which is used to rotate neighbor agents.

https://github.com/ZikangZhou/HiVT/blob/6876656ce7671982ebdc29113aaaa028c2931518/models/local_encoder.py#L184

ZikangZhou commented 1 year ago

Hi @lrq1999,

You're mostly right. The edge_index[0] represents the source, and the edge_index[1] represents the target. In the default setting of message passing used in PyG, the message is passed from the source node to the target node; the source node denotes neighbors (with index j), and the target node denotes the ego (with index i). Here we're passing messages from neighbors to the ego. Feel free to ask me if you have further questions.