Randomness of Evaluation

OpenDriveLab / DriveAdapter

[ICCV 2023 Oral] A New Paradigm for End-to-end Autonomous Driving to Alleviate Causal Confusion

Apache License 2.0

202 stars 16 forks source link

Randomness of Evaluation #8

Open coderlemon17 opened 9 months ago

coderlemon17 commented 9 months ago

Hi, thanks for providing the code. However, when I evaluate the agent's performance on the town05-long-benchmark, there is some randomness in the evaluation results, even with a fixed seed.

After checking the visualization results, I believe some of the randomness comes from the different behaviors of the NPC vehicles, but I'm not sure how this will happen with a fixed random seed. Am I doing something wrong or it's just normal? Any help will be appreciated!

jiaxiaosong1002 commented 9 months ago

@coderlemon17 Yes. Exacty, this has been discussed widely in the community about the randomness. It can not be controlled.

coderlemon17 commented 9 months ago

@jiaxiaosong1002 Thanks for your reply. If this randomness does exist, how many evaluations will you conduct to measure the model's performance under one seed? And will this randomness affect the model's final performance a lot?

jiaxiaosong1002 commented 8 months ago

@coderlemon17 Hi, 3 runs are generally used. Yes, it could be and thus we need results on multiple benchmarks.