Reproducing the results of the questioning

OpenDriveLab / TCP

[NeurIPS 2022] Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline.

Apache License 2.0

365 stars 48 forks source link

Reproducing the results of the questioning #67

Closed zdy1013 closed 3 months ago

zdy1013 commented 3 months ago

I'm glad you published such a great work, why are the results I get so different from yours when I use 2 RTX 3090s and train with 60 epochs using the dataset you provided and evaluate it? Here are the results I reproduced: Avg. driving score: 52.005 Avg. route completion: 85.111 Avg. infraction penalty: 0.647 Collisions with pedestrians: 0.000 Collisions with vehicles: 0.270 Collisions with layout: 0.097 Red lights infractions: 0.070 Stop sign infractions: 0.238 Off-road infractions: 0.198 Route deviations: 0.000 Route timeouts: 0.094 Agent blocked: 0.298

Mr-ChenSH commented 3 months ago

Hello, can you send me a copy of the dataset

Mr-ChenSH commented 3 months ago

Google Cloud Drive has reached its limit

penghao-wu commented 3 months ago

Google Cloud Drive has reached its limit

You can download it from https://huggingface.co/datasets/craigwu/tcp_carla_data

penghao-wu commented 3 months ago

I'm glad you published such a great work, why are the results I get so different from yours when I use 2 RTX 3090s and train with 60 epochs using the dataset you provided and evaluate it? Here are the results I reproduced: Avg. driving score: 52.005 Avg. route completion: 85.111 Avg. infraction penalty: 0.647 Collisions with pedestrians: 0.000 Collisions with vehicles: 0.270 Collisions with layout: 0.097 Red lights infractions: 0.070 Stop sign infractions: 0.238 Off-road infractions: 0.198 Route deviations: 0.000 Route timeouts: 0.094 Agent blocked: 0.298

I suppose you are evaluating the model in 48 routes where our model has 57.01 driving score as reported in the paper. I think the difference is reasonable considering the variance in evaluation and training.

zdy1013 commented 3 months ago

I'm glad you published such a great work, why are the results I get so different from yours when I use 2 RTX 3090s and train with 60 epochs using the dataset you provided and evaluate it? Here are the results I reproduced: Avg. driving score: 52.005 Avg. route completion: 85.111 Avg. infraction penalty: 0.647 Collisions with pedestrians: 0.000 Collisions with vehicles: 0.270 Collisions with layout: 0.097 Red lights infractions: 0.070 Stop sign infractions: 0.238 Off-road infractions: 0.198 Route deviations: 0.000 Route timeouts: 0.094 Agent blocked: 0.298

I suppose you are evaluating the model in 48 routes where our model has 57.01 driving score as reported in the paper. I think the difference is reasonable considering the variance in evaluation and training.

Thank you for your reply. May I ask how to operate if I want to reproduce your score of 75.14? Screenshot_2024-08-06-16-12-15-62_df198e732186825c8df26e3c5a10d7cd

penghao-wu commented 3 months ago

We use all 420K data for training. And the ensemble of the TCP and TCP-SB models is used. Some details can be found in the paper.

zdy1013 commented 3 months ago

We use all 420K data for training. And the ensemble of the TCP and TCP-SB models is used. Some details can be found in the paper.

I would like to know, how do you integrate?What's more, is the data set you provided for 4 towns?

penghao-wu commented 3 months ago

The dataset file should contain all 8 towns' data. For the ensemble strategy, please refer to the last sentence in the image.

zdy1013 commented 3 months ago

The dataset file should contain all 8 towns' data. For the ensemble strategy, please refer to the last sentence in the image.

Ok, thank you for your patient reply!

HXTYI commented 3 months ago

@penghao-wu Hello author, I recently reproduced this project, but the test score is similar to the above figure, between 50-60. What do you mean by 420K data? Is this 115G tcp_carla_data? If I want the results in the figure, do I also need to "If it is a trajectory specialization case, we set α=0.5, if it is a control specialization case, we take the maximum value of the brake control instead of the average value" Hope u can reply !

penghao-wu commented 3 months ago

@penghao-wu Hello author, I recently reproduced this project, but the test score is similar to the above figure, between 50-60. What do you mean by 420K data? Is this 115G tcp_carla_data? If I want the results in the figure, do I also need to "If it is a trajectory specialization case, we set α=0.5, if it is a control specialization case, we take the maximum value of the brake control instead of the average value" Hope u can reply !

Yes, the 420K data includes all 8 towns' data in the data zip. Yes.

Mr-ChenSH commented 3 months ago

@penghao-wu OK，I konw it.