Open Cc-Hy opened 2 years ago
@Cc-Hy @Xianpeng919 Hi, I trained the model on training set and tested on the validation. The moderate class 3D AP is 17.57, but the paper say it was 19.03.
@rockywind Did you use the provided config to train your model?
@Xianpeng919 Yes, I use the default config. https://github.com/Xianpeng919/MonoCon/blob/main/monocon/configs/monocon/monocon_dla34_200e_kitti.py
@Xianpeng919 I trained the model the second time. The result is below.
3D APR40: 23.7064, 17.7595, 14.9525
@rockywind I'll double check and get back to you asap.
@Xianpeng919 Hello, how to modify the cfg file if I want to train with the trainval set and get test results?
@Xianpeng919 I load the pretrained model and train the model. The result is below.
3D APR40: 24.2891, 18.0508, 15.2171
Hello, I train the model with command CUDA_VISIBLE_DEVICES=0 python ./tools/train.py configs/monocon/monocon_dla34_200e_kitti.py
without any modification but the performance is rather lower. The permance get its peak at 120 epoch and get lower and lower util 0.
The result at 120 epoch:
Car AP@0.70, 0.70, 0.70:
3d AP:16.5400, 12.2644, 10.5623
The result at 200 epoch:
Car AP@0.70, 0.70, 0.70:
3d AP:0.0000, 0.0000, 0.0000
trainging log could be seen here What I should do to get a rather normal result ? @Xianpeng919
@rockywind We have tested our released checkpoints in multiple GPUs. The result is 26.33 | 19.03 | 16.00, same as the result in the readme. Not sure what the problem is here. You might provide me with your log so that I can help you check the details.
@Cc-Hy You may replace the training split with the trainval split in the config
@kaixinbear Your dimension branch exploded during training. We did observe this during our experiments. The dimension-aware loss is a little bit unstable. You can restart your training from the un-exploded ckpts.
Thanks for your kindly reply! I will try later
@kaixinbear Your dimension branch exploded during training. We did observe this during our experiments. The dimension-aware loss is a little bit unstable. You can restart your training from the un-exploded ckpts.
Hello author,I resume my training from the un-exploded ckpts, but it still explodes in the follwing epochs. Have you met this phenomenon? Should i turn down my lr ? Thanks!
@Xianpeng919 I tested the released checkpoint. The result is the same as the readme. When I retrained the model, the result was lower than the readme. 20220302_134704.log
@Xianpeng919 I tested the released checkpoint. The result is the same as the readme. When I retrained the model, the result was lower than the readme. 20220302_134704.log
Hi, have your tried multi-gpu training or are you still use single gpu training? I retrained with 4-gpu and get lower results than the readme. https://paste.ubuntu.com/p/CtJH9Hk52F/
@ganyz You can restart the training from scratch.
@rockywind I double checked your log, the config looks good to me. I'll double check the code. You can also try another random seed to train again to see the performance.
@Xianpeng919 OK, thanks a lot!
@rockywind @ganyz @kaixinbear @Xianpeng919 I find that during the training, there will be several epochs whose performance is extremely low(close to 0), and the performance from the last epoch may differ by more than 10 points. Did you meet this situation?
Epoch 112 Epoch 115
Tried another time, and the best performance is as follow:
I conduct 3 experiments with different seeds, and the best performance is 17.80. Besides, results are not reproducible with the same seed and deterministic==True in the codebase.
I retrained twice and got 16.20 on the GTX1080Ti and 16.80 on the Titan V. It seems that no one in the issue can retrain more than 18.00, makes me frustrated.... =_=!
@rockywind @ganyz @kaixinbear @Xianpeng919 I find that during the training, there will be several epochs whose performance is extremely low(close to 0), and the performance from the last epoch may differ by more than 10 points. Did you meet this situation?
It's normal. Mono3D performance is always unstable. Just pay attention to the last few checkpoint eval results. 0.0
@excitohe I konw that the Mono3D performance is always unstable. But results are reproducible with the same seed and deterministic==True in the Monodle codebase. I'm just wondering why nondeterministic algorithms appear when using mmdet reimplementation.
@djp1235a Unified reply from OpenMMLab
@excitohe @djp1235a @Cc-Hy I'm re-training the model based on the released code using different GPUs. I'll share with you the log in this thread once the result is out.
@Cc-Hy You can refer to mmdet3d's visualization scripts. Their scripts are very helpful.
@Xianpeng919 Hello, I tried to add "--show" arg in test.py, and I tried to directly use the mono_det_demo.py. But both of them can not work properly. Can you tell me which script do you use? And do I need to do some modifications?
@Cc-Hy You can do inference you model first and revise the show_results function in the mmdet3d.core.visualizer
@Xianpeng919 Have you finished your retraining results yet? Looking forward to your train log file. ^_^
Tried again:
Hi, I migrate monocon into latest mmdet3d in plugin_dir manner, and try again with only_car with your latest updated config in 4GPU.
Car AP40@0.70, 0.70, 0.70:
bbox AP40:96.3800, 90.3432, 80.7128
bev AP40:29.0449, 22.2251, 19.4256
3d AP40:21.4625, 16.1725, 14.3990
aos AP40:95.73, 89.51, 79.49
Attach the training log: https://paste.ubuntu.com/p/HyryFkZspc/
Can you see where is the problem?Thank you so much and keep in touch. ^_^
I will reconfigure your original environment and test again with single GPU...
@Cc-Hy Hi, Is this your recent result in only_car config? It looks like we're about the same...
@Cc-Hy Hi, Is this your recent result in only_car config? It looks like we're about the same...
No, these are 3 class results. I'm training with Car only now.
Car only
@Cc-Hy @Xianpeng919 @ganyz
Could you please tell me how to solve this model collapse problem? By turn down lr or change random seed? I have tried many times but the AP drops to 0 at about 120 epoch.
If you always meet this problem, you can modify the dimension loss with L1 loss only, L = |gt - pred|. And then the dimension loss will never explode. @kaixinbear
@Xianpeng919 I want to use the model with mono_det_demo.py but it asks me an annotation file, where can i find it ? I precise that i've already trained the model
@Cc-Hy @Xianpeng919 Hi, I trained the model on training set and tested on the validation. The moderate class 3D AP is 17.57, but the paper say it was 19.03.
Hi, i got AP 19.0217 by setting "cfg.SEED = 1903919922 "
Hello, I tried to train the model, but after 120 epochs, the performance is a lot worse than yours. The modification is that I used a larger learning rate 0.001 compare to your original 0.000225. So first I want to ask why the learning rate you choose is so small ( generally contact with the network learning rate around 0.003 to 0.001), do you use pre-training and this is a fine-tuning? And I want to ask for some idea about the results I got, I think the learning rate would not result in such a large gap. And I will use your original learning rate to retrain later. Thanks a lot.