TRI-ML / packnet-sfm

TRI-ML Monocular Depth Estimation Repository
https://tri-ml.github.io/packnet-sfm/
MIT License
1.24k stars 243 forks source link

Training VelSupModel on KITTI gives constant output #99

Closed zshn25 closed 3 years ago

zshn25 commented 3 years ago

I have managed to train the SelfSupModel on KITTI and the result looks quite good. Although, it tends overfits after epoch 16. But, when I train with VelSupModel, inference on the trained model gives constant flat output. My config is as below

### Preparing Model
Model: VelSupModel
DepthNet: DepthResNet
PoseNet: PoseNet
### Preparing Datasets
###### Setup train datasets
#########   45074 (x2): /data/datasets/kitti-raw/splits/eigen_train_files.txt
###### Setup validation datasets
#########     873: /data/datasets/kitti-raw/splits/eigen_val_files.txt
#########   22285: /data/datasets/kitti-raw/splits/eigen_train_files.txt
###### Setup test datasets
#########   22285: /data/datasets/kitti-raw/splits/eigen_train_files.txt

########################################################################################################################
### Config: configs.default_config -> configs.train_kitti.yaml
### Name: VelSupModel_resnet18pt
########################################################################################################################
config:
-- name: VelSupModel_resnet18pt
-- debug: False
-- gpu: [0, 1]
-- arch:
---- seed: 42
---- min_epochs: 1
---- max_epochs: 30
-- checkpoint:
---- filepath: logs/VelSupModel_resnet18pt/{epoch:02d}_{kitti-raw-eigen_val_files-groundtruth-abs_rel_pp_gt:.3f}
---- save_top_k: 10
---- monitor: kitti-raw-eigen_val_files-groundtruth-abs_rel_pp_gt
---- monitor_index: 0
---- mode: auto
---- s3_path: 
---- s3_frequency: 1
---- s3_url: 
-- save:
---- folder: 
---- depth:
------ rgb: True
------ viz: True
------ npz: True
------ png: True
---- pretrained: 
-- wandb:
---- dry_run: True
---- name: 
---- project: 
---- entity: 
---- tags: []
---- dir: 
---- url: 
-- model:
---- name: VelSupModel
---- checkpoint_path: 
---- optimizer:
------ name: Adam
------ depth:
-------- lr: 0.0002
-------- weight_decay: 0.0
------ pose:
-------- lr: 0.0002
-------- weight_decay: 0.0
---- scheduler:
------ name: StepLR
------ step_size: 30
------ gamma: 0.5
------ T_max: 20
---- params:
------ crop: garg
------ min_depth: 0.0
------ max_depth: 80.0
---- loss:
------ num_scales: 4
------ progressive_scaling: 0.0
------ flip_lr_prob: 0.5
------ rotation_mode: euler
------ upsample_depth_maps: True
------ ssim_loss_weight: 0.85
------ occ_reg_weight: 0.1
------ smooth_loss_weight: 0.001
------ C1: 0.0001
------ C2: 0.0009
------ photometric_reduce_op: min
------ disp_norm: True
------ clip_loss: 0.0
------ padding_mode: zeros
------ automask_loss: True
------ velocity_loss_weight: 0.1
------ supervised_method: sparse-l1
------ supervised_num_scales: 4
------ supervised_loss_weight: 0.9
---- depth_net:
------ name: DepthResNet
------ checkpoint_path: 
------ version: 18pt
------ dropout: 0.0
---- pose_net:
------ name: PoseNet
------ checkpoint_path: 
------ version: 
------ dropout: 0.0
-- datasets:
---- augmentation:
------ image_shape: (192, 640)
------ jittering: (0.2, 0.2, 0.2, 0.05)
---- train:
------ batch_size: 16
------ num_workers: 16
------ back_context: 1
------ forward_context: 1
------ dataset: ['KITTI']
------ path: ['/data/datasets/kitti-raw']
------ split: ['splits/eigen_train_files.txt']
------ depth_type: ['velodyne']
------ cameras: [[]]
------ repeat: [2]
------ num_logs: 5
---- validation:
------ batch_size: 1
------ num_workers: 8
------ back_context: 0
------ forward_context: 0
------ dataset: ['KITTI', 'KITTI']
------ path: ['/data/datasets/kitti-raw', '/data/datasets/kitti-raw']
------ split: ['splits/eigen_val_files.txt', 'splits/eigen_train_files.txt']
------ depth_type: ['groundtruth', 'groundtruth']
------ cameras: [[], []]
------ num_logs: 5
---- test:
------ batch_size: 1
------ num_workers: 8
------ back_context: 0
------ forward_context: 0
------ dataset: ['KITTI']
------ path: ['/data/datasets/kitti-raw']
------ split: ['splits/eigen_train_files.txt']
------ depth_type: ['groundtruth']
------ cameras: [[]]
------ num_logs: 5
-- config: configs/train_kitti.yaml
-- default: configs/default_config
-- prepared: True
########################################################################################################################
### Config: configs.default_config -> configs.train_kitti.yaml
### Name: VelSupModel_resnet18pt
########################################################################################################################
VitorGuizilini-TRI commented 3 years ago

You can try with smaller values for the velocity supervision weight, 0.1 is at the upper limit and sometimes stops proper convergence. Another option is to start from a pre-trained self-supervised model and fine-tune with the velocity supervision, that usually works better and converges faster.

zshn25 commented 3 years ago

Strangely, reducing the batch size worked.