RaymondWang987 / NVDS

ICCV 2023 "Neural Video Depth Stabilizer" (NVDS) & TPAMI 2024 "NVDS+: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation" (NVDS+)
MIT License
491 stars 24 forks source link

Questions about testing my own videos of different resolutions #9

Closed yavon818 closed 1 year ago

yavon818 commented 1 year ago

I wonder if the input image size is fixed, as I run into some problems when I use the images of different resolutions (e.g., 688*384 ) , CUDA_VISIBLE_DEVICES=0 python infer_NVDS_dpt_bi.py --base_dir ./demo_outputs/dpt_init/kid_running/ --vnum kid_running --infer_w 688 --infer_h 384 let us begin test NVDS(DPT) demo Load checkpoint: ./gmflow/checkpoints/gmflow_sintel-0c07dcb3.pth **self.shift_size: 0 here mask none **self.shift_size: 0 here mask none **self.shift_size: 0 here mask none **self.shift_size: 0 here mask none /opt/conda/envs/NVDS/lib/python3.8/site-packages/torch/nn/functional.py:3609: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. warnings.warn( Traceback (most recent call last): File "infer_NVDS_dpt_bi.py", line 396, in outputs = dpt.forward(rgb) File "/data_ssd/home/z00647125/NVDS/dpt/models.py", line 115, in forward inv_depth = super().forward(x).squeeze(dim=1) File "/data_ssd/home/z00647125/NVDS/dpt/models.py", line 80, in forward path_3 = self.scratch.refinenet3(path_4, layer_3_rn) File "/opt/conda/envs/NVDS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/data_ssd/home/z00647125/NVDS/dpt/blocks.py", line 372, in forward output = self.skip_add.add(output, res) File "/opt/conda/envs/NVDS/lib/python3.8/site-packages/torch/nn/quantized/modules/functional_modules.py", line 43, in add r = torch.add(x, y) RuntimeError: The size of tensor a (44) must match the size of tensor b (43) at non-singleton dimension 3

RaymondWang987 commented 1 year ago

I wonder if the input image size is fixed, as I run into some problems when I use the images of different resolutions (e.g., 688*384 ) , CUDA_VISIBLE_DEVICES=0 python infer_NVDS_dpt_bi.py --base_dir ./demo_outputs/dpt_init/kid_running/ --vnum kid_running --infer_w 688 --infer_h 384 let us begin test NVDS(DPT) demo Load checkpoint: ./gmflow/checkpoints/gmflow_sintel-0c07dcb3.pth **self.shift_size: 0 here mask none **self.shift_size: 0 here mask none **self.shift_size: 0 here mask none **self.shift_size: 0 here mask none /opt/conda/envs/NVDS/lib/python3.8/site-packages/torch/nn/functional.py:3609: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. warnings.warn( Traceback (most recent call last): File "infer_NVDS_dpt_bi.py", line 396, in outputs = dpt.forward(rgb) File "/data_ssd/home/z00647125/NVDS/dpt/models.py", line 115, in forward inv_depth = super().forward(x).squeeze(dim=1) File "/data_ssd/home/z00647125/NVDS/dpt/models.py", line 80, in forward path_3 = self.scratch.refinenet3(path_4, layer_3_rn) File "/opt/conda/envs/NVDS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/data_ssd/home/z00647125/NVDS/dpt/blocks.py", line 372, in forward output = self.skip_add.add(output, res) File "/opt/conda/envs/NVDS/lib/python3.8/site-packages/torch/nn/quantized/modules/functional_modules.py", line 43, in add r = torch.add(x, y) RuntimeError: The size of tensor a (44) must match the size of tensor b (43) at non-singleton dimension 3

The input image can be changed. However, the --infer_w and --infer_h should be set to integer multiples of 32. For example, you can use --infer_w 672 or --infer_w 704 in your case.

For initial depth predictors (DPT in your case) and our NVDS, the smallest feature maps produced by the backbone is 1/32 of the input width and height. But 688/32=21.5 thus there will be misalignment of resolutions (the 44 and 43 in your error message) in the down-sampling and up-sampling processes (both for DPT, Midas, or our NVDS).