Closed surfii3z closed 3 years ago
This is the result when I trained the network from scratch using the same configuration file (except that I change the learning rate to 0.000025 instead of 0.000001).
So I think we might say that it is not because I fine tune the pretrain network with the indoor dataset. Although the result looks quite different, I think it is because of the scale-awareness.
We have trained PackNet using this repository on the EuROC dataset (https://projects.asl.ethz.ch/datasets/doku.php?id=kmavvisualinertialdatasets), so it should work for similar indoor datasets. One thing you can try is changing the image_shape
augmentation parameter, seems like you are severely deforming the image. It's usually better to maintain aspect ratio as much as possible.
Thank you for your suggestion @VitorGuizilini-TRI. I will try to change data_augmentation parameters and will update with the result.
I came to update following your suggestion @VitorGuizilini-TRI.
I have changed the image_shape
to (288, 384)
which has the aspect ratio 1.33333333333
, I chose this size because the size must be the multiple of 32 (compare to the original shape (594,795)
which has the ratio 1.33838383838
) .
The following is the validation result after 60 epochs of training
The data used to train is in this link
The overall results look good. But, for example in the right most picture, the board's boundary appears to be further away than the wall.
Here is the config file
arch:
max_epochs: 60
checkpoint:
# Folder where .ckpt files will be saved during training
filepath: /workspace/packnet-sfm/results/chept
save_top_k: -1
model:
name: 'SelfSupModel'
optimizer:
name: 'Adam'
depth:
lr: 0.0001
pose:
lr: 0.0001
scheduler:
name: 'StepLR'
step_size: 30
gamma: 0.5
depth_net:
name: 'PackNet01'
version: '1A'
pose_net:
name: 'PoseNet'
version: ''
params:
crop: 'garg'
min_depth: 0.0
max_depth: 80.0
datasets:
augmentation:
image_shape: (288, 384)
train:
batch_size: 4
dataset: ['KITTI']
path: ['/data/datasets/rolling_shutter_rect']
split: ['train.txt']
# depth_type: ['velodyne']
repeat: [2]
validation:
dataset: ['KITTI']
path: ['/data/datasets/rolling_shutter_rect']
split: ['eval.txt']
# depth_type: ['velodyne']
test:
dataset: ['KITTI']
path: ['/data/datasets/rolling_shutter_rect']
split: ['eval.txt']
# depth_type: ['velodyne']
wandb:
dry_run: False
So I have tried with my custom dataset again. This time the data is RGB instead of 3-channel grayscale image. This looks good.
Your work is amazing.
Kudos!
I came to update following your suggestion @VitorGuizilini-TRI. I have changed the
image_shape
to(288, 384)
which has the aspect ratio1.33333333333
, I chose this size because the size must be the multiple of 32 (compare to the original shape(594,795)
which has the ratio1.33838383838
) .The following is the validation result after 60 epochs of training
The data used to train is in this link
The overall results look good. But, for example in the right most picture, the board's boundary appears to be further away than the wall.
Here is the config file
arch: max_epochs: 60 checkpoint: # Folder where .ckpt files will be saved during training filepath: /workspace/packnet-sfm/results/chept save_top_k: -1 model: name: 'SelfSupModel' optimizer: name: 'Adam' depth: lr: 0.0001 pose: lr: 0.0001 scheduler: name: 'StepLR' step_size: 30 gamma: 0.5 depth_net: name: 'PackNet01' version: '1A' pose_net: name: 'PoseNet' version: '' params: crop: 'garg' min_depth: 0.0 max_depth: 80.0 datasets: augmentation: image_shape: (288, 384) train: batch_size: 4 dataset: ['KITTI'] path: ['/data/datasets/rolling_shutter_rect'] split: ['train.txt'] # depth_type: ['velodyne'] repeat: [2] validation: dataset: ['KITTI'] path: ['/data/datasets/rolling_shutter_rect'] split: ['eval.txt'] # depth_type: ['velodyne'] test: dataset: ['KITTI'] path: ['/data/datasets/rolling_shutter_rect'] split: ['eval.txt'] # depth_type: ['velodyne'] wandb: dry_run: False
Did you downsample the image downsample?
Hi all
Thank you for sharing this repo, it is an amazing work.
I want to try to use the depth model for monocular depth estimation in the indoor environment.
The dataset is prepared from TUM rolling shutter dataset. The image is rectified to the size (row, col) = (594, 795) such that the distortion coefficients are all 0 and the principal points are exactly at the centre of the image.
Also, since the network only accepts RGB image as the input. I extend the gray color channel to three dimension so that it can be the input of the neural network.
I fine tune the model "PackNet, Self-Supervised Scale-Aware, 192x640, CS → K" with dataset and the following configuration.
The result looks like the picture below.
In my opinion, it looks okay-ish, however, I think there are a lot of "texture copy artifact" ,for example, at the April's grid and checker board. Is the result to be expected (because I fine tune the network in the indoor dataset/ the input is gray scale/ the size of data augmentation is quite different with the network input)? Or what approach do you suggest to prevent the artifact? Or such artifact just does not affect the performance of the depth map?