About - Githubissues

Cc-Hy commented 2 years ago

Hello, I tried to train the model, but after 120 epochs, the performance is a lot worse than yours. The modification is that I used a larger learning rate 0.001 compare to your original 0.000225. So first I want to ask why the learning rate you choose is so small ( generally contact with the network learning rate around 0.003 to 0.001), do you use pre-training and this is a fine-tuning? And I want to ask for some idea about the results I got, I think the learning rate would not result in such a large gap. And I will use your original learning rate to retrain later. Thanks a lot.

rockywind commented 2 years ago

@Cc-Hy @Xianpeng919 Hi, I trained the model on training set and tested on the validation. The moderate class 3D AP is 17.57, but the paper say it was 19.03. 4b5afd05bed72801c3c19e8bf00661a

Xianpeng919 commented 2 years ago

@rockywind Did you use the provided config to train your model?

rockywind commented 2 years ago

@Xianpeng919 Yes, I use the default config. https://github.com/Xianpeng919/MonoCon/blob/main/monocon/configs/monocon/monocon_dla34_200e_kitti.py

rockywind commented 2 years ago

@Xianpeng919 I trained the model the second time. The result is below.

3D APR40: 23.7064, 17.7595, 14.9525

Xianpeng919 commented 2 years ago

@rockywind I'll double check and get back to you asap.

Cc-Hy commented 2 years ago

@Xianpeng919 Hello, how to modify the cfg file if I want to train with the trainval set and get test results?

rockywind commented 2 years ago

@Xianpeng919 I load the pretrained model and train the model. The result is below.

3D APR40: 24.2891, 18.0508, 15.2171

kaixinbear commented 2 years ago

Hello, I train the model with command CUDA_VISIBLE_DEVICES=0 python ./tools/train.py configs/monocon/monocon_dla34_200e_kitti.py without any modification but the performance is rather lower. The permance get its peak at 120 epoch and get lower and lower util 0. The result at 120 epoch:

Car AP@0.70, 0.70, 0.70:
3d   AP:16.5400, 12.2644, 10.5623

The result at 200 epoch:

Car AP@0.70, 0.70, 0.70:
3d   AP:0.0000, 0.0000, 0.0000

trainging log could be seen here What I should do to get a rather normal result ? @Xianpeng919

Xianpeng919 commented 2 years ago

@rockywind We have tested our released checkpoints in multiple GPUs. The result is 26.33 | 19.03 | 16.00, same as the result in the readme. Not sure what the problem is here. You might provide me with your log so that I can help you check the details.

Xianpeng919 commented 2 years ago

@Cc-Hy You may replace the training split with the trainval split in the config

Xianpeng919 commented 2 years ago

@kaixinbear Your dimension branch exploded during training. We did observe this during our experiments. The dimension-aware loss is a little bit unstable. You can restart your training from the un-exploded ckpts.

kaixinbear commented 2 years ago

Thanks for your kindly reply! I will try later

ganyz commented 2 years ago

@kaixinbear Your dimension branch exploded during training. We did observe this during our experiments. The dimension-aware loss is a little bit unstable. You can restart your training from the un-exploded ckpts.

Hello author，I resume my training from the un-exploded ckpts, but it still explodes in the follwing epochs. Have you met this phenomenon？ Should i turn down my lr ? Thanks!

rockywind commented 2 years ago

@Xianpeng919 I tested the released checkpoint. The result is the same as the readme. When I retrained the model, the result was lower than the readme. 20220302_134704.log

excitohe commented 2 years ago

@Xianpeng919 I tested the released checkpoint. The result is the same as the readme. When I retrained the model, the result was lower than the readme. 20220302_134704.log

Hi, have your tried multi-gpu training or are you still use single gpu training? I retrained with 4-gpu and get lower results than the readme. https://paste.ubuntu.com/p/CtJH9Hk52F/

Xianpeng919 commented 2 years ago

@ganyz You can restart the training from scratch.

Xianpeng919 commented 2 years ago

@rockywind I double checked your log, the config looks good to me. I'll double check the code. You can also try another random seed to train again to see the performance.

rockywind commented 2 years ago

@Xianpeng919 OK， thanks a lot!

Cc-Hy commented 2 years ago

@rockywind @ganyz @kaixinbear @Xianpeng919 I find that during the training, there will be several epochs whose performance is extremely low(close to 0), and the performance from the last epoch may differ by more than 10 points. Did you meet this situation?

Cc-Hy commented 2 years ago

Epoch 112 Epoch 115

Cc-Hy commented 2 years ago

Tried another time, and the best performance is as follow:

djp1235a commented 2 years ago

I conduct 3 experiments with different seeds, and the best performance is 17.80. Besides, results are not reproducible with the same seed and deterministic==True in the codebase.

excitohe commented 2 years ago

I retrained twice and got 16.20 on the GTX1080Ti and 16.80 on the Titan V. It seems that no one in the issue can retrain more than 18.00, makes me frustrated.... =_=!

excitohe commented 2 years ago

@rockywind @ganyz @kaixinbear @Xianpeng919 I find that during the training, there will be several epochs whose performance is extremely low(close to 0), and the performance from the last epoch may differ by more than 10 points. Did you meet this situation?

It's normal. Mono3D performance is always unstable. Just pay attention to the last few checkpoint eval results. 0.0

djp1235a commented 2 years ago

@excitohe I konw that the Mono3D performance is always unstable. But results are reproducible with the same seed and deterministic==True in the Monodle codebase. I'm just wondering why nondeterministic algorithms appear when using mmdet reimplementation.

excitohe commented 2 years ago

@djp1235a Unified reply from OpenMMLab

Xianpeng919 commented 2 years ago

@excitohe @djp1235a @Cc-Hy I'm re-training the model based on the released code using different GPUs. I'll share with you the log in this thread once the result is out.

Xianpeng919 commented 2 years ago

@Cc-Hy You can refer to mmdet3d's visualization scripts. Their scripts are very helpful.

Cc-Hy commented 2 years ago

@Xianpeng919 Hello, I tried to add "--show" arg in test.py, and I tried to directly use the mono_det_demo.py. But both of them can not work properly. Can you tell me which script do you use? And do I need to do some modifications?

Xianpeng919 commented 2 years ago

@Cc-Hy You can do inference you model first and revise the show_results function in the mmdet3d.core.visualizer

excitohe commented 2 years ago

@Xianpeng919 Have you finished your retraining results yet? Looking forward to your train log file. ^_^

Xianpeng919 commented 2 years ago

@excitohe Hi, I was travelling last weekend. Please check this log for more details. I also attach the ckpt here in case you need it. Please run *_car.py config for inference.

Cc-Hy commented 2 years ago

Tried again:

excitohe commented 2 years ago

Hi, I migrate monocon into latest mmdet3d in plugin_dir manner, and try again with only_car with your latest updated config in 4GPU.

Car AP40@0.70, 0.70, 0.70:
bbox AP40:96.3800, 90.3432, 80.7128
bev  AP40:29.0449, 22.2251, 19.4256
3d   AP40:21.4625, 16.1725, 14.3990
aos  AP40:95.73, 89.51, 79.49

Attach the training log: https://paste.ubuntu.com/p/HyryFkZspc/

Can you see where is the problem？Thank you so much and keep in touch. ^_^

I will reconfigure your original environment and test again with single GPU...

excitohe commented 2 years ago

@Cc-Hy Hi, Is this your recent result in only_car config? It looks like we're about the same...

Cc-Hy commented 2 years ago

@Cc-Hy Hi, Is this your recent result in only_car config? It looks like we're about the same...

No, these are 3 class results. I'm training with Car only now.

Cc-Hy commented 2 years ago

Car only

kaixinbear commented 2 years ago

@Cc-Hy @Xianpeng919 @ganyz

Could you please tell me how to solve this model collapse problem? By turn down lr or change random seed? I have tried many times but the AP drops to 0 at about 120 epoch.

Cc-Hy commented 2 years ago

If you always meet this problem, you can modify the dimension loss with L1 loss only, L = |gt - pred|. And then the dimension loss will never explode. @kaixinbear

gervaisi commented 2 years ago

@Xianpeng919 I want to use the model with mono_det_demo.py but it asks me an annotation file, where can i find it ? I precise that i've already trained the model

FlyingAnt2018 commented 11 months ago

@Cc-Hy @Xianpeng919 Hi, I trained the model on training set and tested on the validation. The moderate class 3D AP is 17.57, but the paper say it was 19.03.

Hi, i got AP 19.0217 by setting "cfg.SEED = 1903919922 "

Xianpeng919 / MonoCon

About #3