dt_annos and gt_annos params in the `calculate_iou_partly` function are swapped.

sauravshanu commented 3 years ago

Hi Owen,

Great work! Thanks for uploading the code and providing very clear instructions to run it.

I have two issues that I wanted to ask -

First, I noticed that at the above line dt_annos and gt_annos params in the calculate_iou_partly function are swapped. I am not sure if it matters because IoU operation is commutative.

Second, I ran training for Mono3D with the example config provided in the repo. I was trying to reproduce the results but I am always getting results which are not similar to the expected results on the validation set.

Here are my results.

Car AP(Average Precision)@0.70, 0.70, 0.70:          
bbox AP:83.29, 70.23, 54.05                          
bev  AP:10.64, 8.42, 6.52  
3d   AP:6.76, 5.10, 3.73   
aos  AP:82.75, 69.28, 53.32                          
Car AP(Average Precision)@0.70, 0.50, 0.50:          
bbox AP:83.29, 70.23, 54.05                          
bev  AP:43.14, 32.86, 25.95                          
3d   AP:37.96, 27.91, 22.64                          
aos  AP:82.75, 69.28, 53.32                          

/**** finish testing after training epoch 19 ******/ 

Car AP(Average Precision)@0.70, 0.70, 0.70:          
bbox AP:82.41, 67.77, 51.65                          
bev  AP:9.56, 6.98, 5.34                             
3d   AP:5.48, 3.93, 3.19                             
aos  AP:81.98, 66.96, 51.00
Car AP(Average Precision)@0.70, 0.50, 0.50:          
bbox AP:82.41, 67.77, 51.65
bev  AP:42.07, 32.54, 25.82                          
3d   AP:36.23, 27.81, 21.56
aos  AP:81.98, 66.96, 51.00                          

/**** finish testing after training epoch 29 ******/

Here are my training steps -

Clone the repo and do make.sh.
Download image_2, image_3, calib and label_2 from Kitti official website and unzip them.
Do the exact steps from the Mono3D readme. While copying the config file. I just changed the paths to point to my directories and didn't alter any of the existing hyperparameters.
I used single 2080Ti gpu for training and it takes about 6-7 hours for 30 epochs to run.

Can you please tell me what I am doing wrong here?

Thanks!

Owen-Liuyuxuan commented 3 years ago

I rerun a freshly cloned repo but the result is fine.

It takes about 6 hours on my 1080 Ti with SSD.

Could you provide me the tensorboard log? loss config and more.

tensorboard --logdir workdirs/Mono3D/log

sauravshanu commented 3 years ago

Thanks for the prompt reply. Here is the loss screenshot and the tensorboard log file -

LOSS

Tensorboard log

Mono3D_tensorboad.log

Owen-Liuyuxuan commented 3 years ago

What I can directly identified is that both of my loss goes down much faster than yours. Screenshot from 2021-07-22 14-47-37

And according to the tensorboard log, the recorded model structure is the same, and you have only changed the path and slightly increase the epoch number (which is totally fine).

events.out.tfevents.1626880244.yxliu-ramlab.10362.0.log

I can not diagnose this problem.

Here is something you can try:

Evaluate the pretrained model in the release on the chen split. Even though there are slight changes in the code since the release, the results should still be rather high (the release model is trained on all training data, i.e. it is trained on chen split). You could verify whether the evaluation/forward propagation is correct.
Modify the config file to train and evaluate on the debug split. Which is a small subset of data and we should be able to overfit on it.

By the way, calculate_iou_partly is basically borrowed from other repos and I did not modify the detail. Maybe it is a inherited harmless "bug".

sauravshanu commented 3 years ago

OK. Here are some more evaluations.

Pretrained model provided in the release evaluated on the Validation set.

Car AP(Average Precision)@0.70, 0.70, 0.70:
bbox AP:99.96, 99.85, 74.87
bev  AP:76.54, 58.14, 44.80
3d   AP:73.60, 55.31, 42.30
aos  AP:98.95, 98.83, 74.11
Car AP(Average Precision)@0.70, 0.50, 0.50:
bbox AP:99.96, 99.85, 74.87
bev  AP:93.69, 81.27, 61.95
3d   AP:93.50, 80.96, 61.66
aos  AP:98.95, 98.83, 74.11

Model trained and evaluated on the debug split.

Car AP(Average Precision)@0.70, 0.70, 0.70:         
bbox AP:81.93, 86.51, 71.75                         
bev  AP:12.12, 13.56, 9.99                          
3d   AP:12.03, 12.28, 9.70                          
aos  AP:81.80, 86.25, 71.53                         
Car AP(Average Precision)@0.70, 0.50, 0.50:         
bbox AP:81.93, 86.51, 71.75                         
bev  AP:56.19, 45.85, 35.08                         
3d   AP:51.65, 41.93, 32.83                         
aos  AP:81.80, 86.25, 71.53                         

/**** finish testing after training epoch 19 ******/

Car AP(Average Precision)@0.70, 0.70, 0.70:         
bbox AP:76.49, 81.89, 65.09                         
bev  AP:25.68, 21.57, 16.83                         
3d   AP:20.04, 16.97, 13.27                         
aos  AP:75.48, 81.02, 64.34                         
Car AP(Average Precision)@0.70, 0.50, 0.50:         
bbox AP:76.49, 81.89, 65.09                         
bev  AP:87.21, 76.10, 61.78                         
3d   AP:85.78, 75.06, 58.74                         
aos  AP:75.48, 81.02, 64.34                         

/**** finish testing after training epoch 29 ******/

Owen-Liuyuxuan commented 3 years ago

When using the precompute results downloaded in the release, my result is:

Car AP(Average Precision)@0.70, 0.70, 0.70:
bbox AP:99.96, 99.85, 74.87
bev  AP:96.09, 87.08, 65.24
3d   AP:95.34, 86.26, 64.62
aos  AP:99.14, 98.52, 73.89
Car AP(Average Precision)@0.70, 0.50, 0.50:
bbox AP:99.96, 99.85, 74.87
bev  AP:99.91, 96.82, 71.97
3d   AP:99.90, 96.73, 71.90

When using the original precompute result on chen's split the result is exactly the same as yours.

Car AP(Average Precision)@0.70, 0.70, 0.70:
bbox AP:99.96, 99.85, 74.87
bev  AP:76.54, 58.14, 44.80
3d   AP:73.60, 55.31, 42.30
aos  AP:98.95, 98.83, 74.11
Car AP(Average Precision)@0.70, 0.50, 0.50:
bbox AP:99.96, 99.85, 74.87
bev  AP:93.69, 81.27, 61.95
3d   AP:93.50, 80.96, 61.66
aos  AP:98.95, 98.83, 74.11

The debug split result is rather bad. For me I can go to 50 - 60 mAP. It is clear that it is the training process that goes wrong?

sauravshanu commented 3 years ago

OK. I'll try to fix it. This helps a lot. Thank you :)

Owen-Liuyuxuan / visualDet3D