ZikangZhou / HiVT

[CVPR 2022] HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction
https://openaccess.thecvf.com/content/CVPR2022/papers/Zhou_HiVT_Hierarchical_Vector_Transformer_for_Multi-Agent_Motion_Prediction_CVPR_2022_paper.pdf
Apache License 2.0
577 stars 115 forks source link

Number of local regions #6

Closed cyrushx closed 2 years ago

cyrushx commented 2 years ago

Is the number of local regions N the same as the number of agents (target + context) in Argoverse? If so, would you mind pointing to the code where the local data in each region is computed?

There is a single agent to predict in Argoverse. Are you trying to predict all the agents, including the context agents, with your model and compute the joint loss over all agents?

Thanks!

ZikangZhou commented 2 years ago

During data preprocessing, all objects (including AV, AGENT, and OTHER) are treated equivalently, and the model will make predictions for all objects (though the data quality of some context objects is not so good in Argoverse 1.1). This is similar to some baselines like LaneGCN and Scene Transformer, but we don't normalize the scene according to the position and heading of the "focal agent" defined by the benchmark when preprocessing data. The loss is still a marginal loss, but you can also use a joint loss if you want, just like what is done by Scene Transformer.

The Argoverse benchmark is for single-agent prediction. For this reason, although we model all objects symmetrically and make predictions for all of them, during evaluation we only evaluate the prediction accuracy of the focal agent such that the performance number is comparable with other works.

This repo uses PyG, and the computation of local data is equivalent to building radius graph by modifying the edge index (here and here). Alternately, you can use Pytorch Cluster to play with the edge index and achieve similar or more sophisticated functionality.

cyrushx commented 2 years ago

Thank you!