Image Resolution Issue - Githubissues

KyuminHwang commented 1 year ago

hello. First of all, thank you for sharing such a wonderful study :)

I have a question about the data loader. According to the code given, rgb image will take input from _voidmondi folder, sparse depth and ground trudh depth from _voidreleased folder.

At this time, the images in the void_mondi file have a resolution of 1920x480, but the images in the void_released file have a resolution of 640x480, so when random_crop is performed by running KBNet and this paper, the width becomes 0. (I know that augmentation is performed at 640x480 during training)

Is there an error in my train_list? And additionally, is it correct to do both training and evaluation in 640x480 size?

Thank you and look forward to your reply.

This is the structure of the void dataset folder I downloaded.

Below are examples of each list file in ROOT/training.

void_train_ground_truth_1500.txt data/void_release/void_1500/data/visionlab1/ground_truth/1552089369.0569.png data/void_release/void_1500/data/visionlab1/ground_truth/1552089369.0905.png ...

void_train_image_1500.txt data/void_mondi/void_1500/data/visionlab1/image/1552089369.0569.png data/void_mondi/void_1500/data/visionlab1/image/1552089369.0905.png ...

void_train_sparse_depth_1500.txt data/void_release/void_1500/data/visionlab1/sparse_depth/1552089369.0569.png data/void_release/void_1500/data/visionlab1/sparse_depth/1552089369.0905.png

KyuminHwang commented 1 year ago

I solved this problem using load_triplet image.

alexklwong commented 1 year ago

Hi KyuminHwang, it seems like you closed the issue, but just in case I want to add clarifications.

The setup script will iterate through the dataset concatenating 3 temporally consecutive images (with sufficient parallax) together alone the width. This is done so that the data fetching time can be reduced (instead of fetching 3 images, just fetch 1 large one). The large composite image is split into three images during dataloading (datasets.py) which sets load_image_triplet=True. Once loaded, the data can be processed as a typical training sample.

The images for VOID are in VGA size (480 x 640). The portion you asked about with regards to

--n_height 448 --n_width 576

in mondi-python/bash/train/train_mondi_void.sh

are the random crop sizes. Since the network is fully convolutional, it can handle arbitrary sized images so this step is just an augmentation. At test time, the images that are fed in should be 480 by 640.

KyuminHwang commented 1 year ago

Thank you for your quick and kind reply.

alexklwong / mondi-python

Image Resolution Issue #4