Tord-Zhang commented 2 years ago

Hi, I train the PackNet with train_kitti.yaml and the dataset split you provided, But the results is far more worse than the numbers in the paper. I get abs_rel 0.121 while the results in the paper was about 0.07.

This is the config I used for training: model: name: 'SelfSupModel' optimizer: name: 'Adam' depth: lr: 0.0002 pose: lr: 0.0002 scheduler: name: 'StepLR' step_size: 30 gamma: 0.5 depth_net: name: 'PackNet01' version: '1A' pose_net: name: 'PoseNet' version: '' params: crop: 'garg' min_depth: 0.0 max_depth: 80.0 datasets: augmentation: image_shape: (192, 640) train: batch_size: 4 dataset: ['KITTI'] path: ['datasets/KITTI_raw'] split: ['data_splits/eigen_zhou_files.txt'] depth_type: ['velodyne'] repeat: [2] validation: dataset: ['KITTI'] path: ['datasets/KITTI_raw'] split: ['data_splits/eigen_val_files.txt', 'data_splits/eigen_test_files.txt'] depth_type: ['velodyne'] test: dataset: ['KITTI'] path: ['datasets/KITTI_raw'] split: ['data_splits/eigen_test_files.txt'] depth_type: ['velodyne'] checkpoint: filepath: kitti_ckpt monitor: 'rmse_pp_gt' monitor_index: 0 mode: 'min'

and this is the result I get:

VitorGuizilini-TRI commented 2 years ago

Thank you for that, I will take a look to see if there is something wrong from our end. What is your hardware configuration for training?

Tord-Zhang commented 2 years ago

Thank you for that, I will take a look to see if there is something wrong from our end. What is your hardware configuration for training?

@VitorGuizilini The number of GPUs are 6. All V100 GPUs. Below is the command:

!/bin/bash

NGPUS=$1 LOG_FILE=$2 echo $NGPUS MPI_CMD="mpirun -allow-run-as-root \ -np ${NGPUS} \ -H localhost:${NGPUS} \ -x MASTER_ADDR=127.0.0.1 \ -x MASTER_PORT=23457 \ -x HOROVOD_TIMELINE \ -x OMP_NUM_THREADS=1 \ -x KMP_AFFINITY='granularity=fine,compact,1,0' \ -bind-to none -map-by slot -x NCCL_DEBUG=INFO -x NCCL_MIN_NRINGS=4 \ --report-bindings" COMMAND="python3 scripts/train.py configs/train_kitti.yaml" bash -c "${MPI_CMD} ${COMMAND}" 2>&1 | tee >(sed -r 's/\x1b[[0-9;]*m//g' > ${LOG_FILE})

liortalker commented 2 years ago

The problem is probably that you are training with a resized image (datasets: augmentation: image_shape: (192, 640)). Try using the crop in the default YAML: datasets: augmentation: crop_train_borders: (-352, 0, 0.5, 1216) crop_eval_borders: (-352, 0, 0.5, 1216)

TRI-ML / packnet-sfm

Failed to reproduce the results in the paper on KITTI #230

!/bin/bash