TRI-ML / packnet-sfm

TRI-ML Monocular Depth Estimation Repository
https://tri-ml.github.io/packnet-sfm/
MIT License
1.24k stars 243 forks source link

NRS Kitti Training - Not learning, Uniform Depthmaps during validation #115

Closed jdriscoll319 closed 3 years ago

jdriscoll319 commented 3 years ago

I'm trying to train NRS on the Kitti Raw Eigen set but no learning seems to be occurring. The depth maps being output are completely uniform and there is no change in validation metrics over multiple epochs. I'm training on an AWS g4 instance with 4 NVidia T4 GPUs. Here is a sample config I'm trying:

root@ip-172-30-0-230:/workspace/packnet-sfm# horovodrun -np 4 python3 scripts/train.py configs/train_kitti_nrs.yaml 
[1,0]<stdout>:### Preparing Model
[1,0]<stdout>:Model: GenericSelfSupModel
[1,0]<stdout>:DepthNet: RaySurfaceResNet
[1,0]<stdout>:PoseNet: PoseNet
[1,0]<stdout>:### Preparing Datasets
[1,0]<stdout>:###### Setup train datasets
[1,0]<stdout>:#########   39810 (x1): /data/datasets/KITTI_raw/data_splits/eigen_zhou_files.txt
[1,0]<stdout>:###### Setup validation datasets
[1,0]<stdout>:#########     888: /data/datasets/KITTI_raw/data_splits/eigen_val_files.txt
[1,0]<stdout>:#########     697: /data/datasets/KITTI_raw/data_splits/eigen_test_files.txt
[1,0]<stdout>:###### Setup test datasets
[1,0]<stdout>:#########     697: /data/datasets/KITTI_raw/data_splits/eigen_test_files.txt
[1,0]<stdout>:
[1,0]<stdout>:########################################################################################################################
[1,0]<stdout>:### Config: configs.default_config -> configs.train_kitti_nrs.yaml
[1,0]<stdout>:### Name: default_config-train_kitti_nrs-2021.01.28-20h30m51s
[1,0]<stdout>:########################################################################################################################
[1,0]<stdout>:config:
[1,0]<stdout>:-- name: default_config-train_kitti_nrs-2021.01.28-20h30m51s
[1,0]<stdout>:-- debug: True
[1,0]<stdout>:-- arch:
[1,0]<stdout>:---- seed: 42
[1,0]<stdout>:---- min_epochs: 1
[1,0]<stdout>:---- max_epochs: 50
[1,0]<stdout>:-- checkpoint:
[1,0]<stdout>:---- filepath: /data/datasets/KITTI_raw/checkpoints/default_config-train_kitti_nrs-2021.01.28-20h30m51s/{epoch:02d}_{KITTI_raw-eigen_val_files-velodyne-abs_rel_pp_gt:.3f}
[1,0]<stdout>:---- save_top_k: 5
[1,0]<stdout>:---- monitor: KITTI_raw-eigen_val_files-velodyne-abs_rel_pp_gt
[1,0]<stdout>:---- monitor_index: 0
[1,0]<stdout>:---- mode: min
[1,0]<stdout>:---- s3_path: 
[1,0]<stdout>:---- s3_frequency: 1
[1,0]<stdout>:---- s3_url: 
[1,0]<stdout>:-- save:
[1,0]<stdout>:---- folder: 
[1,0]<stdout>:---- depth:
[1,0]<stdout>:------ rgb: True
[1,0]<stdout>:------ viz: True
[1,0]<stdout>:------ npz: True
[1,0]<stdout>:------ png: True
[1,0]<stdout>:---- pretrained: 
[1,0]<stdout>:-- wandb:
[1,0]<stdout>:---- dry_run: True
[1,0]<stdout>:---- name: kitti-raw-test
[1,0]<stdout>:---- project: kitti-nrs-test
[1,0]<stdout>:---- entity: 
[1,0]<stdout>:---- tags: []
[1,0]<stdout>:---- dir: /data/datasets/KITTI_raw/
[1,0]<stdout>:---- url: 
[1,0]<stdout>:-- model:
[1,0]<stdout>:---- name: GenericSelfSupModel
[1,0]<stdout>:---- checkpoint_path: 
[1,0]<stdout>:---- optimizer:
[1,0]<stdout>:------ name: Adam
[1,0]<stdout>:------ depth:
[1,0]<stdout>:-------- lr: 0.0002
[1,0]<stdout>:-------- weight_decay: 0.0
[1,0]<stdout>:------ pose:
[1,0]<stdout>:-------- lr: 0.0002
[1,0]<stdout>:-------- weight_decay: 0.0
[1,0]<stdout>:---- scheduler:
[1,0]<stdout>:------ name: StepLR
[1,0]<stdout>:------ step_size: 30
[1,0]<stdout>:------ gamma: 0.5
[1,0]<stdout>:------ T_max: 20
[1,0]<stdout>:---- params:
[1,0]<stdout>:------ crop: garg
[1,0]<stdout>:------ min_depth: 0.0
[1,0]<stdout>:------ max_depth: 80.0
[1,0]<stdout>:---- loss:
[1,0]<stdout>:------ num_scales: 4
[1,0]<stdout>:------ progressive_scaling: 0.0
[1,0]<stdout>:------ flip_lr_prob: 0.5
[1,0]<stdout>:------ rotation_mode: euler
[1,0]<stdout>:------ upsample_depth_maps: True
[1,0]<stdout>:------ ssim_loss_weight: 0.85
[1,0]<stdout>:------ occ_reg_weight: 0.1
[1,0]<stdout>:------ smooth_loss_weight: 0.001
[1,0]<stdout>:------ C1: 0.0001
[1,0]<stdout>:------ C2: 0.0009
[1,0]<stdout>:------ photometric_reduce_op: min
[1,0]<stdout>:------ disp_norm: True
[1,0]<stdout>:------ clip_loss: 0.0
[1,0]<stdout>:------ padding_mode: zeros
[1,0]<stdout>:------ automask_loss: True
[1,0]<stdout>:------ velocity_loss_weight: 0.1
[1,0]<stdout>:------ supervised_method: sparse-l1
[1,0]<stdout>:------ supervised_num_scales: 4
[1,0]<stdout>:------ supervised_loss_weight: 0.9
[1,0]<stdout>:---- depth_net:
[1,0]<stdout>:------ name: RaySurfaceResNet
[1,0]<stdout>:------ checkpoint_path: 
[1,0]<stdout>:------ version: 18pt
[1,0]<stdout>:------ dropout: 0.0
[1,0]<stdout>:---- pose_net:
[1,0]<stdout>:------ name: PoseNet
[1,0]<stdout>:------ checkpoint_path: 
[1,0]<stdout>:------ version: 18pt
[1,0]<stdout>:------ dropout: 0.0
[1,0]<stdout>:-- datasets:
[1,0]<stdout>:---- augmentation:
[1,0]<stdout>:------ image_shape: (128, 416)
[1,0]<stdout>:------ jittering: (0.2, 0.2, 0.2, 0.05)
[1,0]<stdout>:---- train:
[1,0]<stdout>:------ batch_size: 1
[1,0]<stdout>:------ num_workers: 16
[1,0]<stdout>:------ back_context: 1
[1,0]<stdout>:------ forward_context: 1
[1,0]<stdout>:------ dataset: ['KITTI']
[1,0]<stdout>:------ path: ['/data/datasets/KITTI_raw']
[1,0]<stdout>:------ split: ['data_splits/eigen_zhou_files.txt']
[1,0]<stdout>:------ depth_type: ['velodyne']
[1,0]<stdout>:------ cameras: [[]]
[1,0]<stdout>:------ repeat: [1]
[1,0]<stdout>:------ num_logs: 5
[1,0]<stdout>:---- validation:
[1,0]<stdout>:------ batch_size: 1
[1,0]<stdout>:------ num_workers: 8
[1,0]<stdout>:------ back_context: 0
[1,0]<stdout>:------ forward_context: 0
[1,0]<stdout>:------ dataset: ['KITTI', 'KITTI']
[1,0]<stdout>:------ path: ['/data/datasets/KITTI_raw', '/data/datasets/KITTI_raw']
[1,0]<stdout>:------ split: ['data_splits/eigen_val_files.txt', 'data_splits/eigen_test_files.txt']
[1,0]<stdout>:------ depth_type: ['velodyne', 'velodyne']
[1,0]<stdout>:------ cameras: [[], []]
[1,0]<stdout>:------ num_logs: 5
[1,0]<stdout>:---- test:
[1,0]<stdout>:------ batch_size: 1
[1,0]<stdout>:------ num_workers: 8
[1,0]<stdout>:------ back_context: 0
[1,0]<stdout>:------ forward_context: 0
[1,0]<stdout>:------ dataset: ['KITTI']
[1,0]<stdout>:------ path: ['/data/datasets/KITTI_raw']
[1,0]<stdout>:------ split: ['data_splits/eigen_test_files.txt']
[1,0]<stdout>:------ depth_type: ['velodyne']
[1,0]<stdout>:------ cameras: [[]]
[1,0]<stdout>:------ num_logs: 5
[1,0]<stdout>:-- config: configs/train_kitti_nrs.yaml
[1,0]<stdout>:-- default: configs/default_config
[1,0]<stdout>:-- prepared: True
[1,0]<stdout>:########################################################################################################################
[1,0]<stdout>:### Config: configs.default_config -> configs.train_kitti_nrs.yaml
[1,0]<stdout>:### Name: default_config-train_kitti_nrs-2021.01.28-20h30m51s
[1,0]<stdout>:########################################################################################################################
jdriscoll319 commented 3 years ago

This is the script I'm using to generate the Ray Surface Template which I pieced together from the paper. Perhaps there's something wrong here?

parser =  argparse.ArgumentParser()
parser.add_argument("--h", type=int, help="Template height")
parser.add_argument("--w", type=int, help="Template width")
parser.add_argument("--o", type=str, help="Output file name without extension")
parser.add_argument("--crop", type=int, help="Optional. Crop template after initialization")
args = parser.parse_args()

w = args.w
h = args.h

fx = cx = w/2
fy = cy = h/2

K = np.array([[fx,  0, cx],
              [ 0, fy, cy],
              [ 0,  0,  1]])
Kinv = np.linalg.inv(K)

p = []
for v in range(h):                  #row
    for u in range(w):              #col
        p.append(np.array([u,v,1]))

p = np.stack(p)         #p.shape = (w*h, 3)
Q = Kinv @ p.T          #Q.shape = (3, w*h)

Q_norm = normalize(Q, axis=0)   #Q.shape = (3, w*h) 

Q_ray_surface = np.reshape(Q_norm.T, [1,3,h,w]).astype('float32')

if args.crop:
    Q_ray_surface = Q_ray_surface[:,:,0:args.crop,:]

fname = args.o
with open(fname + '.npy', 'wb') as f:
    np.save(f, Q_ray_surface)
jdriscoll319 commented 3 years ago

Not sure if it was the only problem but there was definitely an issue with my template script. Running a mini test now and things are looking okay so far. I'll close this out after a few more epochs to confirm things are working.

For anyone following along - the reshape at the end was not working as expected. That line should be replaced with the following loop:

Q_ray_surface = np.zeros((1,3,h,w))
for v in range(w*h):
    idx = p[v,:]
    Qw = idx[0]
    Qh = idx[1]
    ray = Q_norm[:, v]
    Q_ray_surface[0, :, Qh, Qw] = ray
VitorGuizilini-TRI commented 3 years ago

@jdriscoll319 That's interesting, thank you for pointing that out! @ivasiljevic Can you please take a look and see if that fix should be merged into our codebase?

jdriscoll319 commented 3 years ago

Just to be clear, this is a script that I wrote myself as I didn't see anything provided by you guys. You're welcome to use if you'd like though :). I can provide the full version (with imports and the aforementioned fix)

vbelissen commented 3 years ago

Hi @jdriscoll319, @VitorGuizilini-TRI, I just tried your piece of code and I realized the normalization does not exactly match the one that was used to generate omnicam_ray_template.npy. Here is the modification is applied to reproduce exactly the same template file:

import argparse
import numpy as np
from sklearn.preprocessing import normalize
import sys

parser =  argparse.ArgumentParser()
parser.add_argument("--h", 
                    type=int, 
                    help="Template height")
parser.add_argument("--w", type=int,  
                    help="Template width")
parser.add_argument("--o",  
                    type=str,  
                    help="Output file name without extension")
parser.add_argument("--normType",  
                    type=str,   
                    choices=['packnet', 'jdriscoll319'],  
                    help="Whether to normalize like Packnet did (z=1) or like jdriscoll319 did (norm l2 = 1) (cf github issue 115)")
parser.add_argument("--crop",  
                    type=int,  
                    help="Optional. Crop template after initialization")
args = parser.parse_args()

w = args.w
h = args.h

fx = cx = w/2
fy = cy = h/2

K = np.array([[fx,  0, cx],
              [ 0, fy, cy],
              [ 0,  0,  1]])
Kinv = np.linalg.inv(K)

p = []
for v in range(h):                  #row
    for u in range(w):              #col
        p.append(np.array([u,v,1]))

p = np.stack(p)         #p.shape = (w*h, 3)
Q = Kinv @ p.T          #Q.shape = (3, w*h)

if args.normType == 'jdriscoll319':
    Q_norm = normalize(Q, axis=0)   #Q.shape = (3, w*h) 
elif args.normType == 'packnet':
    Q_norm = Q / Q[2, :]
else:
    sys.exit('Wrong normalization type')

Q_ray_surface = np.zeros((1,3,h,w))
for v in range(w*h):
    idx = p[v,:]
    Qw = idx[0]
    Qh = idx[1]
    ray = Q_norm[:, v]
    Q_ray_surface[0, :, Qh, Qw] = ray

if args.crop:
    Q_ray_surface = Q_ray_surface[:,:,0:args.crop,:]

Q_ray_surface = Q_ray_surface.astype('float32')

fname = args.o
with open(fname + '.npy', 'wb') as f:
    np.save(f, Q_ray_surface)

Cheers

jdriscoll319 commented 3 years ago

Oh, good catch! Thanks! Would be interesting to see what, if any, impact that change has, as I was able to get good looking results with my method. Maybe I'll run some more experiments. :)

vbelissen commented 3 years ago

@jdriscoll319 that would be interesting, definitely! Since you have already run experiments with the NRS model, may I ask you how much GPU memory it needs? I have been trying to run it with the train_omnicam.yml config file, and it seems it requires 27GB of GPU memory for a batch_size of 1, which looks huge to me, for a ResNet18-based model...

jdriscoll319 commented 3 years ago

I've been testing on an AWS G4 instance which looks like it uses Tesla T4 GPUs with 16GB of memory. I had to heavily downsize my images to get the network to run, something like 200x400 ish. Also, might be worth noting that true batch size = input batch size * # of gpus - so if you have access to multiple gpus you can get a small batch size in that way.