Closed SimonHanYANG closed 1 year ago
Sorry to write again,
I forgot the other error when running the PyTorch code.
In VOICEDTrainingDataset
Class __getitem__
function, the code
# Load camera intrinsics
intrinsics = np.load(self.intrinsics_paths[index]).astype(np.float32)
occurred error that ValueError: Cannot load a file containing pickled data when allow_pickle=False
.
How can I fix it?
Best, Simon
Hi Simon,
The load_image_triplet function is intended to load 3 images concatenated together. These image triplets are created in the set up script for training images only. During test time, we do not ingest image triplets, but the original (single) image.
The load_image_triplet should only ever be used in training and on training samples created by setup_dataset_void.py. And the paths fed in should also correspond to them i.e. inside the training directory that the setup script creates.
For the second error, intrinsics were stored as text files in the raw data but processed as numpy arrays using setup_dataset_void.py. The error that you encountered is saying that when opening the path to the intrinsics file, it can't open it as a numpy file.
If you have the full stack trace that would help in terms of explaining it. But at the moment, it seems that in both cases, it is a data problem where the images and intrinsics are not set up properly.
Also yes, training on VOID150 also works:
python src/train_voiced.py \
--train_images_path training/void/unsupervised/void_train_image_150.txt \
--train_sparse_depth_path training/void/unsupervised/void_train_sparse_depth_150.txt \
--train_intrinsics_path training/void/unsupervised/void_train_intrinsics_150.txt \
--val_image_path testing/void/void_test_image_150.txt \
--val_sparse_depth_path testing/void/void_test_sparse_depth_150.txt \
--val_ground_truth_path testing/void/void_test_ground_truth_150.txt \
--n_batch 12 \
--n_height 480 \
--n_width 640 \
--input_channels_image 3 \
--input_channels_depth 2 \
--outlier_removal_kernel_size 7 \
--outlier_removal_threshold 1.5 \
--encoder_type vggnet11 \
--n_filters_encoder_image 48 96 192 384 384 \
--n_filters_encoder_depth 16 32 64 128 128 \
--n_filters_decoder 256 128 128 64 0 \
--min_predict_depth 0.1 \
--max_predict_depth 8.0 \
--weight_initializer xavier_normal \
--activation_func leaky_relu \
--learning_rates 1e-4 5e-5 \
--learning_schedule 10 20 \
--rotation_parameterization exponential \
--augmentation_random_crop_type none \
--w_color 0.20 \
--w_structure 0.80 \
--w_sparse_depth 0.50 \
--w_smoothness 2.00 \
--w_pose 0.10 \
--w_weight_decay_depth 0.00 \
--w_weight_decay_pose 0.00 \
--min_evaluate_depth 0.2 \
--max_evaluate_depth 5.0 \
--checkpoint_path trained_voiced/void150/voiced_vgg11 \
--n_step_per_checkpoint 1000 \
--n_step_per_summary 1000 \
--n_image_per_summary 4 \
--start_step_validation 1000 \
--device gpu \
--n_thread 8
which gives
Training input paths:
training/void/unsupervised/void_train_image_150.txt
training/void/unsupervised/void_train_sparse_depth_150.txt
training/void/unsupervised/void_train_intrinsics_150.txt
Validation input paths:
testing/void/void_test_image_150.txt
testing/void/void_test_sparse_depth_150.txt
testing/void/void_test_ground_truth_150.txt
Input settings:
n_batch=12 n_height=480 n_width=640
input_channels_image=3 input_channels_depth=2
outlier_removal_kernel_size=7 outlier_removal_threshold=1.50
Depth network settings:
encoder_type=['vggnet11']
n_filters_encoder_image=[48, 96, 192, 384, 384]
n_filters_encoder_depth=[16, 32, 64, 128, 128]
n_filters_decoder=[256, 128, 128, 64, 0]
min_predict_depth=0.10 max_predict_depth=8.00
Weight settings:
n_parameter=11628976 n_parameter_depth=10041200 n_parameter_pose=1587776
weight_initializer=xavier_normal activation_func=leaky_relu
Training settings:
n_sample=46417 n_epoch=20 n_step=77360
learning_schedule=[0-38680 : 0.0001, 38680-77360 : 5e-05]
augmentation_random_crop_type=['none']
Loss function settings:
w_color=2.0e-01 w_structure=8.0e-01 w_sparse_depth=5.0e-01
w_smoothness=2.0e+00 w_pose=1.0e-01
w_weight_decay_depth=0.0e+00 w_weight_decay_pose=0.0e+00
Evaluation settings:
min_evaluate_depth=0.20 max_evaluate_depth=5.00
Checkpoint settings:
checkpoint_path=trained_voiced/void150/voiced_vgg11
n_step_per_checkpoint=1000
start_step_validation=1000
Tensorboard settings:
event_path=trained_voiced/void150/voiced_vgg11/events
n_step_per_summary=1000 n_image_per_summary=4
Hardware settings:
device=cuda
n_thread=8
Begin training...
Step= 1000/77360 Loss=0.76155 Time Elapsed=0.16h Time Remaining=12.56h
Validation results:
Step MAE RMSE iMAE iRMSE
1000 548.779 670.325 576.047 804.508
Best results:
Step MAE RMSE iMAE iRMSE
1000 548.779 670.325 576.047 804.508
Hi Alex,
Thanks so much!
I finally successfully run the source code.
I'm so happy you and your team made so many awesome projects!
Thanks again!
Best Wishes! Simon
Hi Alex,
After training the VOID150 dataset using your script above, I found that the MAE, RMSE, IMAE, and iRMSE values were not going down, but up.
One of the best values is for the first printed step -- Step 1000.
How to solve this situation?
I trained the VOID1500 dataset, the situation did not appear.
Best Wishes, Simon
VOID150 only has 150 points max. The bash script I posted above is just swapping the paths from VOID1500 to VOID150 to show that it trains. The hyperparameters there are used for VOID1500 and not VOID150. So you will need to tune hyper parameters for it. It is difficult without seeing Tensorboard images, but my guess is that smoothness need to be increased.
Dear Authors.
Glad to see the PyTorch version has been released! Waiting for so long time.
I am not sure about the meaning of the
load_image_triplet
functionThis code comment requires
split along width
, but the defined width is 640, which is not divisible by three. So the following error is reportedI don't really understand why images are split into image0, image1 and image2, can you explain?
Finally, have you tested PyTorch's code with VOID150 data. I'm using a test with VOID150 data and have the above error