TRI-ML / packnet-sfm

TRI-ML Monocular Depth Estimation Repository
https://tri-ml.github.io/packnet-sfm/
MIT License
1.24k stars 243 forks source link

How to reproduce the result on DDAD #142

Open csBob123 opened 3 years ago

csBob123 commented 3 years ago

Hi, Thank you for releasing the code. I am trying to train the packet on DDAD. But I can not reproduce the result so far. I use 8 v100 gpus. The training command is 'CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 horovodrun -np 8 -H localhost:8 python scripts/train.py ./configs/train_ddad.yaml' . The details of my config are as follows: model: name: 'SelfSupModel' optimizer: name: 'Adam' depth: lr: 0.00009 pose: lr: 0.00009 scheduler: name: 'StepLR' step_size: 30 gamma: 0.5 depth_net: name: 'PackNet01' version: '1A' pose_net: name: 'PoseNet' version: '' params: crop: '' min_depth: 0.0 max_depth: 200.0 datasets: augmentation: image_shape: (384, 640) train: batch_size: 8 num_workers: 8 dataset: ['DGP'] path: ['/data/ddad_train_val/ddad.json'] split: ['train'] depth_type: ['lidar'] cameras: [['camera_01']] repeat: [5] validation: num_workers: 8 dataset: ['DGP'] path: ['/data/ddad_train_val/ddad.json'] split: ['val'] depth_type: ['lidar'] cameras: [['camera_01']] test: num_workers: 8 dataset: ['DGP'] path: ['/data/ddad_train_val/ddad.json'] split: ['val'] depth_type: ['lidar'] cameras: [['camera_01']] checkpoint: filepath: './data/experiments' monitor: 'abs_rel_pp_gt' monitor_index: 0 mode: 'min'

[0]:| E: 50 BS: 8 - SelfSupModel LR (Adam): Depth 4.50e-05 Pose 4.50e-05 | [0]:|| [0]:| METRIC | abs_rel | sqr_rel | rmse | rmse_log | a1 | a2 | a3 | [0]:|| [0]:| * /data/ddad_train_val/ddad.json/val (camera_01)  | [0]:|***| [0]:| DEPTH | 0.853 | 23.485 | 37.371 | 2.022 | 0.002 | 0.005 | 0.008  | [0]:| DEPTH_PP | 0.853 | 23.542 | 37.468 | 2.025 | 0.002 | 0.004 | 0.008  | [0]:| DEPTH_GT | 0.268 | 12.451 | 19.267 | 0.333 | 0.705 | 0.869 | 0.936  | [0]:| DEPTH_PP_GT | 0.257 | 11.199 | 18.532 | 0.324 | 0.709 | 0.873 | 0.939  |

Are there any problems? Thank you for your attention.

VitorGuizilini-TRI commented 3 years ago

Hmm, can you try a few things:

By the way, once you get some numbers you can try submitting to our EvalAI DDAD challenge! https://eval.ai/web/challenges/challenge-page/902/overview

csBob123 commented 3 years ago

Hmm, can you try a few things:

* Start from a pre-trained model (e.g. a KITTI model) to see if it diverges

* Try another network (DepthResNet or PoseResNet)

* Play around with the learning rate

By the way, once you get some numbers you can try submitting to our EvalAI DDAD challenge! https://eval.ai/web/challenges/challenge-page/902/overview

Do you use any pre-trained weights to get the result 0.173(abs_rel) on DDAD and 0.111(abs_rel) on KITTI? Or just train from scratch?

VitorGuizilini-TRI commented 3 years ago

No, those are trained from scratch with PackNet. I just mentioned pre-trained weights as a way to see if there is anything wrong with the training setup that you are using.

a1600012888 commented 3 years ago

Hi, Thanks for your work. Was the results on DDAD produced by training from scratch using the config setup provided here? https://github.com/TRI-ML/packnet-sfm/blob/master/configs/train_ddad.yaml

VitorGuizilini-TRI commented 3 years ago

@a1600012888 Yes, that configuration file should work.

a1600012888 commented 3 years ago

@a1600012888 Yes, that configuration file should work.

Thanks!

a1600012888 commented 3 years ago

Hi, Thanks for your work. Was the results on DDAD produced by training from scratch using the config setup provided here? https://github.com/TRI-ML/packnet-sfm/blob/master/configs/train_ddad.yaml

Hi, for DDAD experiments, Did you train the model using 8 gpu cards with this config file? If so, does that means the effective batch size is 8*2=16, and learning rate is 9e-5?