RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 756 but got size 755 for tensor number 1 in the list.

ryoukawanamixm commented 5 months ago

I got RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 756 but got size 755 for tensor number 1 in the list. Does anyone know how to solve this?

Environment

packages

nerfstudio==v1.0.3
gsplat(pip install git+https://github.com/LingzheZhao/gsplat)

Setup

ns-process-data images \
    --data ../data/badgs/custom_dataset/data/images \
    --output-dir ../data/badgs/custom_dataset/my_data

Script

ns-train bad-gaussians \
    --data ../data/badgs/custom_dataset/my_data \
    --vis viewer+tensorboard \
    nerfstudio-data --eval_mode "all"

Error

Step (% Done)       Train Iter (time)    ETA (time)                                                  
--------------------------------------------------------------                                       
0 (0.00%)           836.023 ms           6 h, 58 m, 1 s                                              
Step (% Done)       Train Iter (time)    ETA (time)           Train Rays / Sec                       
Step (% Done)       Train Iter (time)    ETA (time)           Train Rays / Sec     Vis Rays / Sec        
--------------------------------------------------------------------------------------------------------  Invalid by calling <built-in method cat of type object at 0x7fc87a
400 (1.33%)         76.783 ms            37 m, 52 s           8.45 M               17.72 M               
410 (1.37%)         77.335 ms            38 m, 8 s            8.40 M               17.82 M               
420 (1.40%)         75.163 ms            37 m, 3 s            8.60 M               18.05 M               
430 (1.43%)         75.578 ms            37 m, 14 s           8.56 M               17.54 M               
440 (1.47%)         76.835 ms            37 m, 51 s           8.42 M               17.14 M               
450 (1.50%)         76.837 ms            37 m, 50 s           8.39 M               16.72 M               
460 (1.53%)         77.155 ms            37 m, 59 s           8.40 M               16.45 M               
470 (1.57%)         76.871 ms            37 m, 50 s           8.42 M               16.78 M               
480 (1.60%)         74.521 ms            36 m, 39 s           8.64 M               17.51 M               
490 (1.63%)         72.788 ms            35 m, 48 s           8.84 M               17.81 M               
----------------------------------------------------------------------------------------------------     
Viewer running locally at: http://localhost:7007 (listening on 0.0.0.0)                                  
Printing profiling stats, from longest to shortest duration in seconds
Trainer.train_iteration: 0.0751              
VanillaPipeline.get_train_loss_dict: 0.0466              
VanillaPipeline.get_eval_loss_dict: 0.0401              
VanillaPipeline.get_eval_image_metrics_and_images: 0.0244              
ImageRestorationTrainer.eval_iteration: 0.0001              
Traceback (most recent call last):
  File "/root/miniconda3/bin/ns-train", line 8, in <module>
    sys.exit(entrypoint())
  File "/workspaces/nerfstudio/nerfstudio/scripts/train.py", line 262, in entrypoint
    main(
  File "/workspaces/nerfstudio/nerfstudio/scripts/train.py", line 247, in main
    launch(
  File "/workspaces/nerfstudio/nerfstudio/scripts/train.py", line 189, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "/workspaces/nerfstudio/nerfstudio/scripts/train.py", line 100, in train_loop
    trainer.train()
  File "/workspaces/nerfstudio/nerfstudio/engine/trainer.py", line 287, in train
    self.eval_iteration(step)
  File "/workspaces/nerfstudio/nerfstudio/utils/decorators.py", line 70, in wrapper
    ret = func(self, *args, **kwargs)
  File "/workspaces/nerfstudio/nerfstudio/utils/profiler.py", line 112, in inner
    out = func(*args, **kwargs)
  File "/workspaces/si-s-rat-badgs/bad_gaussians/image_restoration_trainer.py", line 123, in eval_iteration
    metrics_dict, images_dict = self.pipeline.get_eval_image_metrics_and_images(step=step)
  File "/workspaces/nerfstudio/nerfstudio/utils/profiler.py", line 112, in inner
    out = func(*args, **kwargs)
  File "/workspaces/nerfstudio/nerfstudio/pipelines/base_pipeline.py", line 340, in get_eval_image_metrics_and_images
    metrics_dict, images_dict = self.model.get_image_metrics_and_images(outputs, batch)
  File "/workspaces/nerfstudio/nerfstudio/models/splatfacto.py", line 913, in get_image_metrics_and_images
    combined_rgb = torch.cat([gt_rgb, predicted_rgb], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 756 but got size 755 for tensor number 1 in the list.

ryoukawanamixm commented 5 months ago

This error occurred only for Custom data, and worked fine for the following data.

ns-train bad-gaussians \
    --data data/real_camera_motion_blur/blurdecoration \
    --pipeline.model.camera-optimizer.mode "cubic" \
    --vis viewer+tensorboard \
    deblur-nerf-data

LingzheZhao commented 5 months ago

Hi, this problem should be related to OpenCV's bug with image undistortion, and has been discussed in https://github.com/nerfstudio-project/nerfstudio/pull/2683.

If you don't want to touch nerfstudio's code, here's another workaround: I provided an option to disable the undistortion process in the DeblurNerfDataParser, you can use it instead of the NerfstudioDataParser (with some additional parameters to revert some special settings for the deblur-nerf dataset)

ns-train bad-gaussians \
    --data ../data/badgs/custom_dataset/my_data \
    --vis viewer+tensorboard \
    deblur-nerf-data \
    --drop-distortion True \
    --colmap_path colmap/sparse/0 \
    --scale_factor 1.0

ryoukawanamixm commented 5 months ago

Thanks for the quick reply!! However, I got the similar error.

  File "/workspaces/nerfstudio/nerfstudio/pipelines/base_pipeline.py", line 340, in get_eval_image_metrics_and_images
    metrics_dict, images_dict = self.model.get_image_metrics_and_images(outputs, batch)
  File "/workspaces/nerfstudio/nerfstudio/models/splatfacto.py", line 913, in get_image_metrics_and_images
    combined_rgb = torch.cat([gt_rgb, predicted_rgb], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 3024 but got size 3023 for tensor number 1 in the list.

LingzheZhao commented 5 months ago

Hi, I will try to reproduce this problem, and it may take some time. Before that, you can finish the training by disabling the evaluation during training: simply replace --vis viewer+tensorboard with --vis viewer.

(FYI, I noticed that the resolution of your custom dataset is quite high. Thus maybe you will need to add the downsampling --downscale_factor 2 or coarse-to-fine training --pipeline.model.num_downscales 2 --pipeline.model.resolution_schedule 3000 parameters in the CLI.)

ryoukawanamixm commented 5 months ago

I saw https://github.com/nerfstudio-project/nerfstudio/pull/2683. And changed image = image[y : y + h, x : x + w] image = image[y : y + h + 1, x : x + w + 1] and other relevant part. It works!

WU-CVGL / BAD-Gaussians