ewrfcas / MVSFormer

Codes of MVSFormer: Multi-View Stereo by Learning Robust Image Features and Temperature-based Depth (TMLR2023)
Apache License 2.0
175 stars 10 forks source link

Custom Dataset #32

Open vietpho opened 6 months ago

vietpho commented 6 months ago

hello,

I'm really interested in running your code, but I would like to use my custom dataset. My dataset consists of images from a small room, captured using an iPad. I've already converted the video frames into individual images.

I'm quite new to the field of MVS and find some aspects challenging to grasp. Could you please provide detailed steps on how to run your code with my custom dataset? Any guidance would be greatly appreciated.

thank you!

ewrfcas commented 6 months ago

Hi, if you want to process sequential images, please refer to https://github.com/jzhangbs/Vis-MVSNet, and then search for "Quick test on your own data". After the SFM processing, you should obtain a standard data format (mvsnet format) to run our MVSFormer as DTU dataset.

vietpho commented 6 months ago

Hello, thank you for the prompt response.

I've run colmap through the link you provided and converted the data to mvsnet format. However, now I'm encountering a problem with the model.

When I run it like this, ./pretrained_weights/alt_gvt_small.pth it loads, and an error occurs in mvsformer_model.py.

The error is exactly as follows:

(base) * python ./MVSFormer/test.py argv: [] ################################ args ################################ model mvsnet <class 'str'>
device 3 <class 'str'>
config ./MVSFormer/configs/config_mvsformer_blendmvs.json <class 'str'>
dataset custom <class 'str'>
testpath ./room <class 'str'>
testlist ./MVSFormer/lists/custom_dataset/dataset.txt <class 'str'>
exp_name None <class 'NoneType'>
batch_size 1 <class 'int'>
numdepth 256 <class 'int'>
resume ./MVSFormer/pretrained_weights/MVSFormer-Blended/MVSFormer-Blended/best.pth <class 'str'>
outdir ./test_results_mvs/mvsformer_results/room_results <class 'str'>
display False <class 'bool'>
share_cr False <class 'bool'>
ndepths None <class 'NoneType'>
depth_interals_ratio None <class 'NoneType'>
cr_base_chs 8,8,8 <class 'str'>
grad_method detach <class 'str'>
no_refinement False <class 'bool'>
full_res False <class 'bool'>
interval_scale 1.0 <class 'float'>
num_view 20 <class 'int'>
max_h 1080 <class 'int'>
max_w 1920 <class 'int'>
fix_res False <class 'bool'>
depth_scale 1.0 <class 'float'>
temperature 0.01 <class 'float'>
num_worker 4 <class 'int'>
save_freq 20 <class 'int'>
filter_method dpcd <class 'str'>
prob_threshold 0.5,0.5,0.5,0.5 <class 'str'>
thres_view 3 <class 'int'>
thres_disp 1.0 <class 'float'>
downsample None <class 'NoneType'>
dist_base 4.0 <class 'float'>
rel_diff_base 1300.0 <class 'float'>
fusibile_exe_path ./fusibile/fusibile <class 'str'>
disp_threshold 0.2 <class 'float'>
num_consistent 3.0 <class 'float'>
use_short_range True <class 'bool'>
combine_conf True <class 'bool'>
tmp 1.0 <class 'float'>
tmps 5.0,5.0,5.0,1.0 <class 'str'>
save_all_confs False <class 'bool'>
######################################################################## ***
Interval_Scale** 1.0 dataset test metas: 546 interval_scale:{'room': 1.0} drop_path_rate: --- 0.2 missing keys:['norm_list.0.weight', 'norm_list.0.bias', 'norm_list.1.weight', 'norm_list.1.bias', 'norm_list.2.weight', 'norm_list.2.bias', 'norm_list.3.weight', 'norm_list.3.bias'] unexpected keys:[] error msgs:[] Loading checkpoint: ./MVSFormer/pretrained_weights/MVSFormer-Blended/MVSFormer-Blended/best.pth ... Traceback (most recent call last): File "./MVSFormer/test.py", line 588, in save_depth(testlist, config) File "./MVSFormer/test.py", line 248, in save_depth outputs = model.forward(imgs, cam_params, sample_cuda['depth_values'], tmp=tmp) File "./MVSFormer/models/mvsformer_model.py", line 399, in forward vit_out = self.decoder_vit.forward(vit1, vit2, vit3, vit4) File "./MVSFormer/models/module.py", line 409, in forward x = self.smooth1(self.upsampler0(x4) + self.inner1(x3)) # 1/64->1/32 RuntimeError: The size of tensor a (32) must match the size of tensor b (33) at non-singleton dimension 2

In gvt.py, From class ALTGVT, the forward_features value starts coming out as 33 instead of 32.

The input image size is 1920x1080, and with the patch size set to 4, it becomes 480x270. This then reduces by half to make vit1, 2, 3, 4 each 240x135, 120x67, 60x33, 30x16.

This eventually leads to an error here in class TwinDecoderStage4(nn.Module):

def forward(self, x1, x2, x3, x4):  # in:[1/8 ~ 1/64] out:[1/2,1/4,1/8]
    x = self.smooth1(self.upsampler0(x4) + self.inner1(x3))  # 1/64->1/32
    x = self.smooth2(F.upsample(x, scale_factor=2, mode='bilinear', align_corners=False) + self.inner2(x2))  # 1/32->1/16
    x = self.smooth3(F.upsample(x, scale_factor=2, mode='bilinear', align_corners=False) + self.inner3(x1))  # 1/16->1/8

    return x

The tensors do not match. I've been stuck on this all day and can't make any progress. Could I get some help in resolving this error?

Also, I have another question: Is it normal to see missing keys:['norm_list.0.weight', 'norm_list.0.bias', 'norm_list.1.weight', 'norm_list.1.bias', 'norm_list.2.weight', 'norm_list.2.bias', 'norm_list.3.weight', 'norm_list.3.bias']? I assume it appears due to the absence of a normal list, but I'm worried if it might indicate a problem. Should I ignore it, or is it a concern?

ewrfcas commented 6 months ago

For the first question, you should pad the image to 1920x1088 rather than 1920x1080. You should see https://github.com/ewrfcas/MVSFormer/blob/72bbd0b6a697e023feefbf76ae3aec34b340e575/datasets/general_eval.py#L108 and https://github.com/ewrfcas/MVSFormer/blob/72bbd0b6a697e023feefbf76ae3aec34b340e575/datasets/general_eval.py#L88 for more details. Because the image's width and height must be divisible by 32. And the camera intrinsic matrix must be adjusted too (if you resize the image to 1920x1088, you have to also change the focal length of the intrinsic matrix so I do not recommend doing that).

For the second question, I think that all norm_list parameters should be already contained in the proposed ckpt. image

vietpho commented 6 months ago

Thank you for the kind and prompt response!

I'm encountering missing keys in the "alt_gvt_small" model. The keys are:

missing keys:['norm_list.0.weight', 'norm_list.0.bias', 'norm_list.1.weight', 'norm_list.1.bias', 'norm_list.2.weight', 'norm_list.2.bias', 'norm_list.3.weight', 'norm_list.3.bias']

These are not coming from the "MVSFormer/pretrained_weights/MVSFormer-Blended/MVSFormer-Blended/best.pth". I checked the keys in the alt_gvt_small model, and these keys are missing, which I believe is why I am getting this error. However, I am unsure if it's safe to ignore this or if it needs to be addressed.

ewrfcas commented 6 months ago

Thank you for the kind and prompt response!

I'm encountering missing keys in the "alt_gvt_small" model. The keys are:

missing keys:['norm_list.0.weight', 'norm_list.0.bias', 'norm_list.1.weight', 'norm_list.1.bias', 'norm_list.2.weight', 'norm_list.2.bias', 'norm_list.3.weight', 'norm_list.3.bias']

These are not coming from the "MVSFormer/pretrained_weights/MVSFormer-Blended/MVSFormer-Blended/best.pth". I checked the keys in the alt_gvt_small model, and these keys are missing, which I believe is why I am getting this error. However, I am unsure if it's safe to ignore this or if it needs to be addressed.

This missing issue is normal, and would not influence the performance. Because all norm_list weights are newly added based on pre-trained VITs. This warning is caused by loading pre-trained "alt_gvt_small.pth" rather than loading "MVSFormer/pretrained_weights/MVSFormer-Blended/MVSFormer-Blended/best.pth". Moreover, "best.pth" contains all weights in "alt_gvt_small.pth". You can just remove loading "alt_gvt_small.pth" for the inference. https://github.com/ewrfcas/MVSFormer/blob/72bbd0b6a697e023feefbf76ae3aec34b340e575/models/mvsformer_model.py#L335