autonomousvision / transfuser

[PAMI'23] TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving; [CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving
MIT License
1.17k stars 192 forks source link

Which metric is the most important one #243

Closed MCUBE-2023 closed 2 months ago

MCUBE-2023 commented 2 months ago

Hi, I did 2 experiments on the same route (short route 05):

* Experiment 1: In the first experiment (and as you can see in the first attached screen-shot), I got a failure for the RouteCompletionTest with a value: 40.91%. For the OutsideRouteLanesTest, CollisionTest, RunningRedLightTest, and RunningStopTest, I got a success with 0% for each one of these metrics.

* Experiment 2: In the second experiment (and as you can see in the second attached screen-shot), I got a failure for the RouteCompletionTest with a value: 47.27%. For the RunningRedLightTest, and RunningStopTest, I got a success with 0% for each one of these metrics. However, for the OutsideRouteLanesTest I got a failure with 3.11%, and for the CollisionTest I got a failure since I had "1 times" value for this metric.

So the questions are the following ones:

1- In a general context, which metric among these ones (RouteCompletionTest, OutsideRouteLanesTest, CollisionTest, RunningRedLightTest, RunningStopTest, InRouteTest, AgentBlockedtest) is the most important one? And If there is a metric that is more important than the other among these metrics, would you please give the order of priority for each one of them ?

2- For the specific context (context of comparing the results of the 2 experiments), the RouteCompletionTest metric in experiment 2 is higher than the one in experiment 1, but experiment 2 had a failure with 3.11%, for OutsideRouteLanesTest metric and "1 times" value for the CollisionTest. So the question is: Which one between these 2 experiments reflects a better performance for the autonomous vehicle ?

1- image_1 - 4091 2- image_2 - 4727

Kait0 commented 2 months ago

The CARLA leaderboard has a metric called driving score, which is an aggregate metric that tells you which of these models are better. It is described in section 4.5 Metrics in the paper.

The driving score is not printed on screen but logged in the results.json file that the leaderboard produces. Also we recommend using the result parser to analyse the json file as it fixes some bugs in the metric calculation.