TRI-ML / packnet-sfm

TRI-ML Monocular Depth Estimation Repository
https://tri-ml.github.io/packnet-sfm/
MIT License
1.24k stars 243 forks source link

ImageDataset not reading images. #96

Closed yashdeep01 closed 3 years ago

yashdeep01 commented 3 years ago

Hi, firstly thanks a lot for providing the implementation of the model!

I am trying to train the self-supervised model with my custom images. I am not using a docker here, and my environment seems to be set up well (since python3 scripts/train.py overfit_kitti.yaml is working fine). However, when I am using my custom dataset, by running this command:

python3 scripts/train.py configs/train_images.yaml

I get ZeroDivisionError: division by zero. The full error given below also enlists the config params used:

### Preparing Model
Model: SelfSupModel
DepthNet: PackNet01
PoseNet: PoseNet
### Preparing Datasets
###### Setup train datasets
#########       0 (x1): /home/ec2-user/SageMaker/packnet-sfm/data/datasets/blackvue_images/train_set.txt
###### Setup validation datasets
#########     100: /home/ec2-user/SageMaker/packnet-sfm/data/datasets/blackvue_images/val_set.txt
###### Setup test datasets
#########     100: /home/ec2-user/SageMaker/packnet-sfm/data/datasets/blackvue_images/test_set.txt

########################################################################################################################
### Config: configs.default_config -> configs.train_images.yaml
### Name: default_config-train_images-2020.11.23-19h08m14s
########################################################################################################################
config:
-- name: default_config-train_images-2020.11.23-19h08m14s
-- debug: False
-- arch:
---- seed: 42
---- min_epochs: 1
---- max_epochs: 50
-- checkpoint:
---- filepath: 
---- save_top_k: 5
---- monitor: loss
---- monitor_index: 0
---- mode: auto
---- s3_path: 
---- s3_frequency: 1
---- s3_url: 
-- save:
---- folder: 
---- depth:
------ rgb: True
------ viz: True
------ npz: True
------ png: True
---- pretrained: 
-- wandb:
---- dry_run: True
---- name: 
---- project: 
---- entity: 
---- tags: []
---- dir: 
---- url: 
-- model:
---- name: SelfSupModel
---- checkpoint_path: 
---- optimizer:
------ name: Adam
------ depth:
-------- lr: 0.0002
-------- weight_decay: 0.0
------ pose:
-------- lr: 0.0002
-------- weight_decay: 0.0
---- scheduler:
------ name: StepLR
------ step_size: 30
------ gamma: 0.5
------ T_max: 20
---- params:
------ crop: garg
------ min_depth: 0.0
------ max_depth: 80.0
---- loss:
------ num_scales: 4
------ progressive_scaling: 0.0
------ flip_lr_prob: 0.5
------ rotation_mode: euler
------ upsample_depth_maps: True
------ ssim_loss_weight: 0.85
------ occ_reg_weight: 0.1
------ smooth_loss_weight: 0.001
------ C1: 0.0001
------ C2: 0.0009
------ photometric_reduce_op: min
------ disp_norm: True
------ clip_loss: 0.0
------ padding_mode: zeros
------ automask_loss: True
------ velocity_loss_weight: 0.1
------ supervised_method: sparse-l1
------ supervised_num_scales: 4
------ supervised_loss_weight: 0.9
---- depth_net:
------ name: PackNet01
------ checkpoint_path: 
------ version: 1A
------ dropout: 0.0
---- pose_net:
------ name: PoseNet
------ checkpoint_path: 
------ version: 
------ dropout: 0.0
-- datasets:
---- augmentation:
------ image_shape: (1080, 1920)
------ jittering: (0.2, 0.2, 0.2, 0.05)
---- train:
------ batch_size: 4
------ num_workers: 16
------ back_context: 1
------ forward_context: 1
------ dataset: ['Image']
------ path: ['/home/ec2-user/SageMaker/packnet-sfm/data/datasets/blackvue_images/']
------ split: ['train_set.txt']
------ depth_type: ['']
------ cameras: [[]]
------ repeat: [1]
------ num_logs: 5
---- validation:
------ batch_size: 1
------ num_workers: 8
------ back_context: 0
------ forward_context: 0
------ dataset: ['Image']
------ path: ['/home/ec2-user/SageMaker/packnet-sfm/data/datasets/blackvue_images/']
------ split: ['val_set.txt']
------ depth_type: ['']
------ cameras: [[]]
------ num_logs: 5
---- test:
------ batch_size: 1
------ num_workers: 8
------ back_context: 0
------ forward_context: 0
------ dataset: ['Image']
------ path: ['/home/ec2-user/SageMaker/packnet-sfm/data/datasets/blackvue_images/']
------ split: ['test_set.txt']
------ depth_type: ['']
------ cameras: [[]]
------ num_logs: 5
-- config: configs/train_images.yaml
-- default: configs/default_config
-- prepared: True
########################################################################################################################
### Config: configs.default_config -> configs.train_images.yaml
### Name: default_config-train_images-2020.11.23-19h08m14s
########################################################################################################################

0 images [00:00, ? images/s]
Traceback (most recent call last):
  File "scripts/train.py", line 64, in <module>
    train(args.file)
  File "scripts/train.py", line 59, in train
    trainer.fit(model_wrapper)
  File "/home/ec2-user/SageMaker/packnet-sfm/packnet_sfm/trainers/horovod_trainer.py", line 58, in fit
    self.train(train_dataloader, module, optimizer)
  File "/home/ec2-user/SageMaker/packnet-sfm/packnet_sfm/trainers/horovod_trainer.py", line 98, in train
    return module.training_epoch_end(outputs)
  File "/home/ec2-user/SageMaker/packnet-sfm/packnet_sfm/models/model_wrapper.py", line 219, in training_epoch_end
    loss_and_metrics = average_loss_and_metrics(output_batch, 'avg_train')
  File "/home/ec2-user/SageMaker/packnet-sfm/packnet_sfm/utils/reduce.py", line 215, in average_loss_and_metrics
    average_key(batch_list, key)
  File "/home/ec2-user/SageMaker/packnet-sfm/packnet_sfm/utils/reduce.py", line 173, in average_key
    return sum(values) / len(values)
ZeroDivisionError: division by zero

It does not seem to read any image from the directory provided, particularly in the training dataset, even though the path is the same in train/val/test, i.e. /home/ec2-user/SageMaker/packnet-sfm/data/datasets/blackvue_images/. This directory contains 100 images. Can you please help resolve this?

Side query: Do splits .txt files play a role in image_dataset.py?

VitorGuizilini-TRI commented 3 years ago

Any luck with that? I have been working on an updated ImageDataset that should be easier to use.

yashdeep01 commented 3 years ago

Thanks for the response. I made a few tweaks locally on the code and now it works for me. However I was looking for a way to make my depth-maps scale aware. While you are updating ImageDataset, could you provide some implementation for VelSupModel to use it for custom images? This would greatly help me in my work. Thanks.

jdriscoll319 commented 3 years ago

@yashdeep01 I just ran into this exact same issue (coincidentally, I'm also even using Blackvue images :) ). What tweaks did you end up making that fixed this?

jdriscoll319 commented 3 years ago

For anyone else that might run into this issue - the 'Image dataset' does not expect a file with the list of training/val/test images. It instead directly reads all of the files in the defined directory and subdirectories. The split: ['{:09}'] line in the config is a formatting technique used when the data is being prepped. You should edit 09 to match the number of digits used in your file names. ie. My files use 6 digits ('000000.jpg') so I edited the the line to be split: ['{:06}']

luda1013 commented 2 years ago

Thanks for the response. I made a few tweaks locally on the code and now it works for me. However I was looking for a way to make my depth-maps scale aware. While you are updating ImageDataset, could you provide some implementation for VelSupModel to use it for custom images? This would greatly help me in my work. Thanks.

Hallo, i also want to train my custom data but i got the same error. Can i know or can u more explain how you tweak locally for this problem? thanks:)