argoverse / av2-api

Argoverse 2: Next generation datasets for self-driving perception and forecasting.
https://argoverse.github.io/user-guide/
MIT License
307 stars 71 forks source link

Fail to submit results on tracking test set #195

Closed cc-caner closed 1 year ago

cc-caner commented 1 year ago

1684721391460

neeharperi commented 1 year ago

Hi @cc-caner, Thanks for raising this issue. We've downgraded our version of numpy on the evaluation server, hopefully this addresses this issue. Please let me know if you have any further issues.

cc-caner commented 1 year ago

嗨,感谢您提出此问题。我们已经在评估服务器上降级了我们的 numpy 版本,希望这能解决这个问题。如果您有任何其他问题,请告诉我。 Thank you very much, this issue has been solved. I also have a question as to why there is a big gap between local results and commit results on the forecasting validation set. The result ofFor the same .pkl file, the result of submission is almost six points higher than the local on the map_f metric, and my av2-api is already the latest version.

neeharperi commented 1 year ago

We are also using the latest version of the av2-api on our evaluation server. Can you confirm that you are using map-based filtering in your local evaluation?

cc-caner commented 1 year ago

We are also using the latest version of the av2-api on our evaluation server. Can you confirm that you are using map-based filtering in your local evaluation?

I did use filtering. Since the LT3D/forecasting/test_forecaster.py does not contain the code to calculate the final average map_f, I suspect there is something wrong with the way I calculate it. Can you send the code for calculating the average metrics ?

neeharperi commented 1 year ago

Here is how we compute average metrics:

full_metrics = evaluate_forecasts(prediction, ground_truth, top_k, max_range_m, dataset_dir)
mAP_F = np.nanmean([metrics["mAP_F"] for traj_metrics in full_metrics.values() for metrics in traj_metrics.values()])
ADE = np.nanmean([metrics["ADE"] for traj_metrics in full_metrics.values() for metrics in traj_metrics.values()])
FDE = np.nanmean([metrics["FDE"] for traj_metrics in full_metrics.values() for metrics in traj_metrics.values()])

We will push an update to our baselines to perform map-based filtering and output these average metrics by default. Please let me know if you have any additional questions.