How to reproduce the result on DDAD

csBob123 commented 3 years ago

Hi, Thank you for releasing the code. I am trying to train the packet on DDAD. But I can not reproduce the result so far. I use 8 v100 gpus. The training command is 'CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 horovodrun -np 8 -H localhost:8 python scripts/train.py ./configs/train_ddad.yaml' . The details of my config are as follows: model: name: 'SelfSupModel' optimizer: name: 'Adam' depth: lr: 0.00009 pose: lr: 0.00009 scheduler: name: 'StepLR' step_size: 30 gamma: 0.5 depth_net: name: 'PackNet01' version: '1A' pose_net: name: 'PoseNet' version: '' params: crop: '' min_depth: 0.0 max_depth: 200.0 datasets: augmentation: image_shape: (384, 640) train: batch_size: 8 num_workers: 8 dataset: ['DGP'] path: ['/data/ddad_train_val/ddad.json'] split: ['train'] depth_type: ['lidar'] cameras: [['camera_01']] repeat: [5] validation: num_workers: 8 dataset: ['DGP'] path: ['/data/ddad_train_val/ddad.json'] split: ['val'] depth_type: ['lidar'] cameras: [['camera_01']] test: num_workers: 8 dataset: ['DGP'] path: ['/data/ddad_train_val/ddad.json'] split: ['val'] depth_type: ['lidar'] cameras: [['camera_01']] checkpoint: filepath: './data/experiments' monitor: 'abs_rel_pp_gt' monitor_index: 0 mode: 'min'

[0]:| [2m[1m[32mE: 50 BS: 8 - SelfSupModel LR (Adam): Depth 4.50e-05 Pose 4.50e-05[0m | [0]:|| [0]:| METRIC | abs_rel | sqr_rel | rmse | rmse_log | a1 | a2 | a3 | [0]:|| [0]:| [1m[35m* /data/ddad_train_val/ddad.json/val (camera_01) [0m | [0]:|***| [0]:| [36mDEPTH | 0.853 | 23.485 | 37.371 | 2.022 | 0.002 | 0.005 | 0.008 [0m | [0]:| [36mDEPTH_PP | 0.853 | 23.542 | 37.468 | 2.025 | 0.002 | 0.004 | 0.008 [0m | [0]:| [36mDEPTH_GT | 0.268 | 12.451 | 19.267 | 0.333 | 0.705 | 0.869 | 0.936 [0m | [0]:| [36mDEPTH_PP_GT | 0.257 | 11.199 | 18.532 | 0.324 | 0.709 | 0.873 | 0.939 [0m |

Are there any problems? Thank you for your attention.

VitorGuizilini-TRI commented 3 years ago

Hmm, can you try a few things:

Start from a pre-trained model (e.g. a KITTI model) to see if it diverges
Try another network (DepthResNet or PoseResNet)
Play around with the learning rate

By the way, once you get some numbers you can try submitting to our EvalAI DDAD challenge! https://eval.ai/web/challenges/challenge-page/902/overview

csBob123 commented 3 years ago

Hmm, can you try a few things:
* Start from a pre-trained model (e.g. a KITTI model) to see if it diverges

* Try another network (DepthResNet or PoseResNet)

* Play around with the learning rate
By the way, once you get some numbers you can try submitting to our EvalAI DDAD challenge! https://eval.ai/web/challenges/challenge-page/902/overview

Do you use any pre-trained weights to get the result 0.173(abs_rel) on DDAD and 0.111(abs_rel) on KITTI? Or just train from scratch?

VitorGuizilini-TRI commented 3 years ago

No, those are trained from scratch with PackNet. I just mentioned pre-trained weights as a way to see if there is anything wrong with the training setup that you are using.

a1600012888 commented 3 years ago

Hi, Thanks for your work. Was the results on DDAD produced by training from scratch using the config setup provided here? https://github.com/TRI-ML/packnet-sfm/blob/master/configs/train_ddad.yaml

VitorGuizilini-TRI commented 3 years ago

@a1600012888 Yes, that configuration file should work.

a1600012888 commented 3 years ago

@a1600012888 Yes, that configuration file should work.

Thanks!

a1600012888 commented 3 years ago

Hi, Thanks for your work. Was the results on DDAD produced by training from scratch using the config setup provided here? https://github.com/TRI-ML/packnet-sfm/blob/master/configs/train_ddad.yaml

Hi, for DDAD experiments, Did you train the model using 8 gpu cards with this config file? If so, does that means the effective batch size is 8*2=16, and learning rate is 9e-5?

TRI-ML / packnet-sfm

How to reproduce the result on DDAD #142