ZikangZhou / HiVT

[CVPR 2022] HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction
https://openaccess.thecvf.com/content/CVPR2022/papers/Zhou_HiVT_Hierarchical_Vector_Transformer_for_Multi-Agent_Motion_Prediction_CVPR_2022_paper.pdf
Apache License 2.0
600 stars 118 forks source link

Result on Waymo dataset #13

Closed ZYJ-JMF closed 1 year ago

ZYJ-JMF commented 2 years ago

Thanks for your code. Do you have to test your model performance on the waymo motion dataset? I got a really bad result of minADE more than 4 and FDE more than 10. I am looking forward to your reply. Thank you in advance.

ZikangZhou commented 2 years ago

I didn't have a full test on Waymo, but recently I applied the same idea on Argoverse 2 and achieved pretty strong results. I think the model is not overfitting a single dataset. I feel that your results look like random predictions. Perhaps you should first check whether you have converted the prediction results back to the original coordinate system. Usually speaking, you can get minADE lower than 2.0 after training one epoch.

ZikangZhou commented 2 years ago

To get a strong result on Waymo or Argoverse 2, the model may need a larger receptive field since the prediction horizon in these new datasets is pretty long. On Argoverse 2 (which requires predicting a 6-second future) I crop a local map for each agent with a radius of 150m.

ZYJ-JMF commented 2 years ago

I didn't have a full test on Waymo, but recently I applied the same idea on Argoverse 2 and achieved pretty strong results. I think the model is not overfitting a single dataset. I feel that your results look like random predictions. Perhaps you should first check whether you have converted the prediction results back to the original coordinate system. Usually speaking, you can get minADE lower than 2.0 after training one epoch.

Thanks for your reply. But in the code the data.y is normalized by agents and the prediction is in the same coordinate system as data.y. And can directly calculate the loss. So I do not understand what is the meaning of "converte the prediction results back to the original coordinate system". Can you explain the idea in detail? Thanks

ZikangZhou commented 2 years ago

I'm not sure how you evaluate the model. If you evaluate the model by submitting it to the online evaluation server, you need to pay attention to the coordinate system and make sure that the coordinate system you use is consistent with that adopted by the evaluation server. If you evaluate the model offline by using the metrics I implement in the codebase, it should be fine since I have converted the ground truth into agent-centric local coordinate systems. But one more thing you should note is that some datasets have missing values in the ground truth. You may get abnormal results if you don't mask out those missing values correctly when calculating the metric numbers. The metrics I implement in this codebase assume that the ground-truth trajectories are always complete since Argoverse 1 doesn't have this problem.

For example, the code snippet below is an implementation of ADE with missing values in consideration:

class ADE(Metric):

    def __init__(self, **kwargs) -> None:
        super(ADE, self).__init__(**kwargs)
        self.add_state('sum', default=torch.tensor(0.0), dist_reduce_fx='sum')
        self.add_state('count', default=torch.tensor(0), dist_reduce_fx='sum')

    def update(self,
               pred: torch.Tensor,
               target: torch.Tensor,
               valid_mask: Optional[torch.Tensor] = None) -> None:
        if valid_mask is None:
            valid_mask = target.new_ones(target.size()[:-1], dtype=torch.bool)
        self.sum += ((torch.norm(pred - target, p=2, dim=-1) * valid_mask).sum(dim=-1) / valid_mask.sum(dim=-1)).sum()
        self.count += pred.size(0)

    def compute(self) -> torch.Tensor:
        return self.sum / self.count
ZYJ-JMF commented 2 years ago

Thanks for your reply. I changed the ADE metrics and also gave a larger receptive field of 200m. But the result almost no change. the minADE still converge at 4 and do not reduce. And I also overfitting the network, but the minADE is still 4. Will you experiment on the waymo dataset? I am not sure where the problem is.