MCZhi / DIPP

[TNNLS] Differentiable Integrated Prediction and Planning Framework for Urban Autonomous Driving
https://mczhi.github.io/DIPP/
197 stars 40 forks source link

Open-loop test performance #10

Closed yougeyxt closed 1 year ago

yougeyxt commented 1 year ago

Hi authors, I am trying to train the model without the planner part (only prediction of the ego-vehicle and 10 surrounding vehicles), but the performance seems to be far from your results.

The settings are as follows: after data processing, there are 91,463 training data points and 20,227 testing data points. I trained 40 epochs with the default training settings (e.g., learning rate, batch size, etc.) and the command is as follows python train.py --name Exp --train_set ./data/processed/train --valid_set ./data/processed/test --seed 3407 --num_workers 8 --pretrain_epochs 40 --train_epochs 40 --batch_size 32 --learning_rate 2e-4 --device cuda Note that I have fixed the map_process type issue of the ego-vehicle in both data_process.py and test_utils.py before training and testing the model.

Some open-loop test results are shown below, the plotted prediction point is at 0.5s resolution. It seems the prediction of the ego-vehicle is bad, especially in the first 1-second prediction.

3bab78d68bea2056_30 1c4422ec0c586167_20 5e25018deb3f3230_140 8b58f0afebccd66_40 6842e5eb39c66c2c_140 900749e0bc731a9e_140 37904258b222dbac_130 bf124d587a57ed0d_30 ece8952445959869_30 fe0ddf8b2fcbb1cc_70

Quantitatively, I used one file uncompressed_scenario_training_20s_training_20s.tfrecord-00088-of-01000 as the test set to calculate the open-loop evaluation metrics, which yields mean Human_L2_1s=0.91m, mean Human_L2_3s=0.82m, mean Human_L2_5s=2.37m, predicitonADE=0.71m, predicitonFDE=1.77m. I also tried using four training data files (00000 to 00003) to conduct the open-loop test, and the results are similar mean Human_L2_1s=1.48 m, mean Human_L2_3s=0.96m, mean Human_L2_5s=2.43m, predicitonADE=0.66m, predicitonFDE=1.66m. The ego-vehicle planning error (here I use the initial prediction) seems to be far worse compared with the results in Table1 of the paper (e.g., Human_L2_1s around 0.15 to 0.2). Could you please help with that?

The training log is attached below in case that helps train_log.csv

Thank you!

MCZhi commented 1 year ago

The training process looks good. What was the command you used for open-loop testing? Did you disable the planner during the open-loop test?

yougeyxt commented 1 year ago

Yes I think I disabled the planner during the open-loop test and the command I used is python open_loop_test.py --name open_loop_test --test_set ./data/raw/train-0-to3 --model_path ./training_log/Exp2/model_34_0.8928.pth --device cuda.

The testing results log and csv files are attached below in case that helps:

I can attach the trained model checkpoint here or send it by email to you if that is needed. Thank you for helping with that!

yougeyxt commented 1 year ago

Hi, I found some potential bugs that possibly contribute to the bad performance and I summarized them below. However, I still cannot solve the problem.

  1. The bicycle model inconsistency in test_utils.py and train_utils.py files. In line 96 of test_utils.py, it did not use delta to approximate tan(delta) as done in train_utils.py.
  2. In the training process, the select_future function actually did not use the scores to select the future plan and prediction. It uses the best_mode which is determined by using the ground truth of the ego-vehicle and surrounding agents' trajectories. This may lead the ADE and FDE in the training log better than they should be since it uses the ground truth rather than the NN output score to conduct the selection. But I am not sure whether this will influence the training of the model. What is your opinion on that?

Thank you.

MCZhi commented 1 year ago

Hi, the potential reasons you listed would not significantly influence the testing performance. After comparing your training log with mine, I found that your planner ADE is much worse (mine is around 0.7 meters in validation). So I guess the problem is in the training process. I am not sure what's the exact reason but I suggest trying to increase the weight of imitation learning in the loss function, adding an FDE loss to imitation, or using another random seed. Or maybe you can try using the physical (unicycle) model.

yougeyxt commented 1 year ago

Hi, thanks for the reply. The planner ADE (0.7 meters in validation) you mentioned is at the 20 epochs with the trained planner? May I know your planner ADE at the end of the pretrain epoch without the planner?

MCZhi commented 1 year ago

That's the result of 20 epochs without a trained planner. The ADE should be below 0.7 meters if the model is trained properly. I have added the FDE loss in the imitation. Hope it would be helpful.

yougeyxt commented 1 year ago

Thanks for the information. I will try them later.