Uncertain about Pytorch load_image_triplet function does

SimonHanYANG commented 1 year ago

Dear Authors.

Glad to see the PyTorch version has been released! Waiting for so long time.

I am not sure about the meaning of the load_image_triplet function

# Split along width
image1, image0, image2 = np.split(images, indices_or_sections=3, axis=1)

This code comment requires split along width, but the defined width is 640, which is not divisible by three. So the following error is reported

ValueError: array split does not result in an equal division

I don't really understand why images are split into image0, image1 and image2, can you explain?

Finally, have you tested PyTorch's code with VOID150 data. I'm using a test with VOID150 data and have the above error

SimonHanYANG commented 1 year ago

Sorry to write again,

I forgot the other error when running the PyTorch code.

In VOICEDTrainingDataset Class __getitem__ function, the code

# Load camera intrinsics
intrinsics = np.load(self.intrinsics_paths[index]).astype(np.float32)

occurred error that ValueError: Cannot load a file containing pickled data when allow_pickle=False.

How can I fix it?

Best, Simon

alexklwong commented 1 year ago

Hi Simon,

The load_image_triplet function is intended to load 3 images concatenated together. These image triplets are created in the set up script for training images only. During test time, we do not ingest image triplets, but the original (single) image.

The load_image_triplet should only ever be used in training and on training samples created by setup_dataset_void.py. And the paths fed in should also correspond to them i.e. inside the training directory that the setup script creates.

For the second error, intrinsics were stored as text files in the raw data but processed as numpy arrays using setup_dataset_void.py. The error that you encountered is saying that when opening the path to the intrinsics file, it can't open it as a numpy file.

If you have the full stack trace that would help in terms of explaining it. But at the moment, it seems that in both cases, it is a data problem where the images and intrinsics are not set up properly.

alexklwong commented 1 year ago

Also yes, training on VOID150 also works:

python src/train_voiced.py \
--train_images_path training/void/unsupervised/void_train_image_150.txt \
--train_sparse_depth_path training/void/unsupervised/void_train_sparse_depth_150.txt \
--train_intrinsics_path training/void/unsupervised/void_train_intrinsics_150.txt \
--val_image_path testing/void/void_test_image_150.txt \
--val_sparse_depth_path testing/void/void_test_sparse_depth_150.txt \
--val_ground_truth_path testing/void/void_test_ground_truth_150.txt \
--n_batch 12 \
--n_height 480 \
--n_width 640 \
--input_channels_image 3 \
--input_channels_depth 2 \
--outlier_removal_kernel_size 7 \
--outlier_removal_threshold 1.5 \
--encoder_type vggnet11 \
--n_filters_encoder_image 48 96 192 384 384 \
--n_filters_encoder_depth 16 32 64 128 128 \
--n_filters_decoder 256 128 128 64 0 \
--min_predict_depth 0.1 \
--max_predict_depth 8.0 \
--weight_initializer xavier_normal \
--activation_func leaky_relu \
--learning_rates 1e-4 5e-5 \
--learning_schedule 10 20 \
--rotation_parameterization exponential \
--augmentation_random_crop_type none \
--w_color 0.20 \
--w_structure 0.80 \
--w_sparse_depth 0.50 \
--w_smoothness 2.00 \
--w_pose 0.10 \
--w_weight_decay_depth 0.00 \
--w_weight_decay_pose 0.00 \
--min_evaluate_depth 0.2 \
--max_evaluate_depth 5.0 \
--checkpoint_path trained_voiced/void150/voiced_vgg11 \
--n_step_per_checkpoint 1000 \
--n_step_per_summary 1000 \
--n_image_per_summary 4 \
--start_step_validation 1000 \
--device gpu \
--n_thread 8

which gives

Training input paths:
training/void/unsupervised/void_train_image_150.txt
training/void/unsupervised/void_train_sparse_depth_150.txt
training/void/unsupervised/void_train_intrinsics_150.txt

Validation input paths:
testing/void/void_test_image_150.txt
testing/void/void_test_sparse_depth_150.txt
testing/void/void_test_ground_truth_150.txt

Input settings:
n_batch=12  n_height=480  n_width=640
input_channels_image=3  input_channels_depth=2
outlier_removal_kernel_size=7  outlier_removal_threshold=1.50

Depth network settings:
encoder_type=['vggnet11']
n_filters_encoder_image=[48, 96, 192, 384, 384]
n_filters_encoder_depth=[16, 32, 64, 128, 128]
n_filters_decoder=[256, 128, 128, 64, 0]
min_predict_depth=0.10  max_predict_depth=8.00

Weight settings:
n_parameter=11628976  n_parameter_depth=10041200  n_parameter_pose=1587776
weight_initializer=xavier_normal  activation_func=leaky_relu

Training settings:
n_sample=46417  n_epoch=20  n_step=77360
learning_schedule=[0-38680 : 0.0001, 38680-77360 : 5e-05]

augmentation_random_crop_type=['none']

Loss function settings:
w_color=2.0e-01  w_structure=8.0e-01  w_sparse_depth=5.0e-01
w_smoothness=2.0e+00  w_pose=1.0e-01
w_weight_decay_depth=0.0e+00  w_weight_decay_pose=0.0e+00

Evaluation settings:
min_evaluate_depth=0.20  max_evaluate_depth=5.00

Checkpoint settings:
checkpoint_path=trained_voiced/void150/voiced_vgg11
n_step_per_checkpoint=1000
start_step_validation=1000

Tensorboard settings:
event_path=trained_voiced/void150/voiced_vgg11/events
n_step_per_summary=1000  n_image_per_summary=4

Hardware settings:
device=cuda
n_thread=8

Begin training...

Step=  1000/77360  Loss=0.76155  Time Elapsed=0.16h  Time Remaining=12.56h
Validation results:
    Step       MAE      RMSE      iMAE     iRMSE
    1000   548.779   670.325   576.047   804.508
Best results:
    Step       MAE      RMSE      iMAE     iRMSE
    1000   548.779   670.325   576.047   804.508

SimonHanYANG commented 1 year ago

Hi Alex,

Thanks so much!

I finally successfully run the source code.

I'm so happy you and your team made so many awesome projects!

Thanks again!

Best Wishes! Simon

SimonHanYANG commented 1 year ago

Hi Alex,

After training the VOID150 dataset using your script above, I found that the MAE, RMSE, IMAE, and iRMSE values were not going down, but up.

One of the best values is for the first printed step -- Step 1000.

void150_epoch20000

How to solve this situation?

I trained the VOID1500 dataset, the situation did not appear.

Best Wishes, Simon

alexklwong commented 1 year ago

VOID150 only has 150 points max. The bash script I posted above is just swapping the paths from VOID1500 to VOID150 to show that it trains. The hyperparameters there are used for VOID1500 and not VOID150. So you will need to tune hyper parameters for it. It is difficult without seeing Tensorboard images, but my guess is that smoothness need to be increased.

alexklwong / unsupervised-depth-completion-visual-inertial-odometry

Uncertain about Pytorch load_image_triplet function does #6