TRI-ML / packnet-sfm

TRI-ML Monocular Depth Estimation Repository
https://tri-ml.github.io/packnet-sfm/
MIT License
1.24k stars 243 forks source link

Texture copy artifact problem #94

Closed surfii3z closed 3 years ago

surfii3z commented 3 years ago

Hi all

Thank you for sharing this repo, it is an amazing work.

I want to try to use the depth model for monocular depth estimation in the indoor environment.


The dataset is prepared from TUM rolling shutter dataset. The image is rectified to the size (row, col) = (594, 795) such that the distortion coefficients are all 0 and the principal points are exactly at the centre of the image.

np.array([[371.74580354175663 , 0.    , 397.5],
               [0.    , 371.74580354175663 , 297.0],
               0.    , 0.    , 1.          ]])

Also, since the network only accepts RGB image as the input. I extend the gray color channel to three dimension so that it can be the input of the neural network.


I fine tune the model "PackNet, Self-Supervised Scale-Aware, 192x640, CS → K" with dataset and the following configuration.

arch:
    max_epochs: 40

checkpoint:
    # Folder where .ckpt files will be saved during training
    filepath: /workspace/packnet-sfm/results/chept
    save_top_k: -1

model:
    name: 'SelfSupModel'
        checkpoint_path: /workspace/packnet-sfm/trained_models/PackNet01_MR_velsup_CStoK.ckpt
    optimizer:
        name: 'Adam'
        depth:
            lr: 0.000001
        pose:
            lr: 0.000001
    scheduler:
        name: 'StepLR'
        step_size: 30
        gamma: 0.5
    depth_net:
        name: 'PackNet01'
        version: '1A'
    pose_net:
        name: 'PoseNet'
        version: ''
    params:
        crop: 'garg'
        min_depth: 0.0
        max_depth: 80.0
datasets:
    augmentation:
        # should be multiple of 32
        image_shape: (352, 480)
        # image_shape: (32, 32)
    train:
        batch_size: 4
        dataset: ['KITTI']
        path: ['/data/datasets/rolling_shutter_rect']
        split: ['train.txt']
        repeat: [2]
    validation:
        dataset: ['KITTI']
        path: ['/data/datasets/rolling_shutter_rect']
        split: ['eval.txt']

The result looks like the picture below.

Screenshot from 2020-11-17 22-07-34

In my opinion, it looks okay-ish, however, I think there are a lot of "texture copy artifact" ,for example, at the April's grid and checker board. Is the result to be expected (because I fine tune the network in the indoor dataset/ the input is gray scale/ the size of data augmentation is quite different with the network input)? Or what approach do you suggest to prevent the artifact? Or such artifact just does not affect the performance of the depth map?

surfii3z commented 3 years ago

This is the result when I trained the network from scratch using the same configuration file (except that I change the learning rate to 0.000025 instead of 0.000001).

Screenshot from 2020-11-17 22-07-48

So I think we might say that it is not because I fine tune the pretrain network with the indoor dataset. Although the result looks quite different, I think it is because of the scale-awareness.

VitorGuizilini-TRI commented 3 years ago

We have trained PackNet using this repository on the EuROC dataset (https://projects.asl.ethz.ch/datasets/doku.php?id=kmavvisualinertialdatasets), so it should work for similar indoor datasets. One thing you can try is changing the image_shape augmentation parameter, seems like you are severely deforming the image. It's usually better to maintain aspect ratio as much as possible.

surfii3z commented 3 years ago

Thank you for your suggestion @VitorGuizilini-TRI. I will try to change data_augmentation parameters and will update with the result.

surfii3z commented 3 years ago

I came to update following your suggestion @VitorGuizilini-TRI. I have changed the image_shape to (288, 384) which has the aspect ratio 1.33333333333, I chose this size because the size must be the multiple of 32 (compare to the original shape (594,795) which has the ratio 1.33838383838 ) .

The following is the validation result after 60 epochs of training

The data used to train is in this link

Screenshot from 2020-11-24 12-40-31

The overall results look good. But, for example in the right most picture, the board's boundary appears to be further away than the wall.

Here is the config file

arch:
    max_epochs: 60

checkpoint:
    # Folder where .ckpt files will be saved during training
    filepath: /workspace/packnet-sfm/results/chept
    save_top_k: -1

model:
    name: 'SelfSupModel'
    optimizer:
        name: 'Adam'
        depth:
            lr: 0.0001
        pose:
            lr: 0.0001
    scheduler:
        name: 'StepLR'
        step_size: 30
        gamma: 0.5
    depth_net:
        name: 'PackNet01'
        version: '1A'
    pose_net:
        name: 'PoseNet'
        version: ''
    params:
        crop: 'garg'
        min_depth: 0.0
        max_depth: 80.0
datasets:
    augmentation:
        image_shape: (288, 384)
    train:
        batch_size: 4
        dataset: ['KITTI']
        path: ['/data/datasets/rolling_shutter_rect']
        split: ['train.txt']
        # depth_type: ['velodyne']
        repeat: [2]
    validation:
        dataset: ['KITTI']
        path: ['/data/datasets/rolling_shutter_rect']
        split: ['eval.txt']
        # depth_type: ['velodyne']
    test:
        dataset: ['KITTI']
        path: ['/data/datasets/rolling_shutter_rect']
        split: ['eval.txt']
        # depth_type: ['velodyne']
wandb:
    dry_run: False
surfii3z commented 3 years ago

Update

So I have tried with my custom dataset again. This time the data is RGB instead of 3-channel grayscale image. This looks good.

Your work is amazing.

Kudos!

Screenshot from 2020-11-26 10-00-11 Screenshot from 2020-11-26 10-00-27

hutingz commented 1 year ago

I came to update following your suggestion @VitorGuizilini-TRI. I have changed the image_shape to (288, 384) which has the aspect ratio 1.33333333333, I chose this size because the size must be the multiple of 32 (compare to the original shape (594,795) which has the ratio 1.33838383838 ) .

The following is the validation result after 60 epochs of training

The data used to train is in this link

Screenshot from 2020-11-24 12-40-31

The overall results look good. But, for example in the right most picture, the board's boundary appears to be further away than the wall.

Here is the config file

arch:
    max_epochs: 60

checkpoint:
    # Folder where .ckpt files will be saved during training
    filepath: /workspace/packnet-sfm/results/chept
    save_top_k: -1

model:
    name: 'SelfSupModel'
    optimizer:
        name: 'Adam'
        depth:
            lr: 0.0001
        pose:
            lr: 0.0001
    scheduler:
        name: 'StepLR'
        step_size: 30
        gamma: 0.5
    depth_net:
        name: 'PackNet01'
        version: '1A'
    pose_net:
        name: 'PoseNet'
        version: ''
    params:
        crop: 'garg'
        min_depth: 0.0
        max_depth: 80.0
datasets:
    augmentation:
        image_shape: (288, 384)
    train:
        batch_size: 4
        dataset: ['KITTI']
        path: ['/data/datasets/rolling_shutter_rect']
        split: ['train.txt']
        # depth_type: ['velodyne']
        repeat: [2]
    validation:
        dataset: ['KITTI']
        path: ['/data/datasets/rolling_shutter_rect']
        split: ['eval.txt']
        # depth_type: ['velodyne']
    test:
        dataset: ['KITTI']
        path: ['/data/datasets/rolling_shutter_rect']
        split: ['eval.txt']
        # depth_type: ['velodyne']
wandb:
    dry_run: False

Did you downsample the image downsample?