As described in #166, there is a discrepancy between how planning metrics (L2 and collision rate) are computed in this repo vs previous works, such as VAD and ST-P3.
However, in the paper, these values are compared in the same table, leading to confusion. This PR enables computation of the UniAD metrics using the legacy definition used by VAD/ST-P3, making comparison easier. However, it still defaults to the "original" UniAD way of evaluating. This results in a ranking shift when comparing the methods, see the table below
As described in #166, there is a discrepancy between how planning metrics (L2 and collision rate) are computed in this repo vs previous works, such as VAD and ST-P3.
However, in the paper, these values are compared in the same table, leading to confusion. This PR enables computation of the UniAD metrics using the legacy definition used by VAD/ST-P3, making comparison easier. However, it still defaults to the "original" UniAD way of evaluating. This results in a ranking shift when comparing the methods, see the table below