Closed yougeyxt closed 1 year ago
The training process looks good. What was the command you used for open-loop testing? Did you disable the planner during the open-loop test?
Yes I think I disabled the planner during the open-loop test and the command I used is python open_loop_test.py --name open_loop_test --test_set ./data/raw/train-0-to3 --model_path ./training_log/Exp2/model_34_0.8928.pth --device cuda
.
The testing results log and csv files are attached below in case that helps:
Using 088 file as the test set: test-088.log testing_log-088.csv
Using 0 to 3 files as the test set: testing_log-0-to-3.csv test-0-to-3.log
I can attach the trained model checkpoint here or send it by email to you if that is needed. Thank you for helping with that!
Hi, I found some potential bugs that possibly contribute to the bad performance and I summarized them below. However, I still cannot solve the problem.
test_utils.py
and train_utils.py
files. In line 96 of test_utils.py
, it did not use delta to approximate tan(delta) as done in train_utils.py
.select_future
function actually did not use the scores
to select the future plan and prediction. It uses the best_mode
which is determined by using the ground truth of the ego-vehicle and surrounding agents' trajectories. This may lead the ADE and FDE in the training log better than they should be since it uses the ground truth rather than the NN output score to conduct the selection. But I am not sure whether this will influence the training of the model. What is your opinion on that?Thank you.
Hi, the potential reasons you listed would not significantly influence the testing performance. After comparing your training log with mine, I found that your planner ADE is much worse (mine is around 0.7 meters in validation). So I guess the problem is in the training process. I am not sure what's the exact reason but I suggest trying to increase the weight of imitation learning in the loss function, adding an FDE loss to imitation, or using another random seed. Or maybe you can try using the physical (unicycle) model.
Hi, thanks for the reply. The planner ADE (0.7 meters in validation) you mentioned is at the 20 epochs with the trained planner? May I know your planner ADE at the end of the pretrain epoch without the planner?
That's the result of 20 epochs without a trained planner. The ADE should be below 0.7 meters if the model is trained properly. I have added the FDE loss in the imitation. Hope it would be helpful.
Thanks for the information. I will try them later.
Hi authors, I am trying to train the model without the planner part (only prediction of the ego-vehicle and 10 surrounding vehicles), but the performance seems to be far from your results.
The settings are as follows: after data processing, there are 91,463 training data points and 20,227 testing data points. I trained 40 epochs with the default training settings (e.g., learning rate, batch size, etc.) and the command is as follows
python train.py --name Exp --train_set ./data/processed/train --valid_set ./data/processed/test --seed 3407 --num_workers 8 --pretrain_epochs 40 --train_epochs 40 --batch_size 32 --learning_rate 2e-4 --device cuda
Note that I have fixed themap_process
type issue of the ego-vehicle in bothdata_process.py
andtest_utils.py
before training and testing the model.Some open-loop test results are shown below, the plotted prediction point is at 0.5s resolution. It seems the prediction of the ego-vehicle is bad, especially in the first 1-second prediction.
Quantitatively, I used one file
uncompressed_scenario_training_20s_training_20s.tfrecord-00088-of-01000
as the test set to calculate the open-loop evaluation metrics, which yieldsmean Human_L2_1s=0.91m, mean Human_L2_3s=0.82m, mean Human_L2_5s=2.37m, predicitonADE=0.71m, predicitonFDE=1.77m
. I also tried using four training data files (00000 to 00003) to conduct the open-loop test, and the results are similarmean Human_L2_1s=1.48 m, mean Human_L2_3s=0.96m, mean Human_L2_5s=2.43m, predicitonADE=0.66m, predicitonFDE=1.66m
. The ego-vehicle planning error (here I use the initial prediction) seems to be far worse compared with the results in Table1 of the paper (e.g., Human_L2_1s around 0.15 to 0.2). Could you please help with that?The training log is attached below in case that helps train_log.csv
Thank you!