RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.

shivam-spyne commented 2 months ago

> _bg_points: Requires grad: True
Parameter containing:
Traceback (most recent call last):
  File "/home/aiwork-8/members/shivamj/exp/Frosting/train.py", line 236, in <module>
    frosting_path = refined_training(frosting_args)
  File "/home/aiwork-8/members/shivamj/exp/Frosting/frosting_trainers/refine.py", line 536, in refined_training
    f"     > Min:{param.min().item()}",
RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.

basically param is turning out to be empty tensor([], device='cuda:0', size=(0, 3), requires_grad=True)

Anttwo commented 2 months ago

Hi @shivam-spyne,

Indeed, the parameter _bg_points is empty for your scene. Frosting allows for using "Background Gaussians" around the mesh to better reconstruct far-away, background regions (like the sky for example, etc.).

You can deactivate the background Gaussians by adding --use_background_gaussians False to the arguments of the train_full_pipeline.py script. You should avoid the error by doing that. Please also note, now that you've ran the code once, you can skip the vanilla 3DGS optimization by adding --gs_output_dir ./output/vanilla_gs/<your scene dir name> to the arguments of the script.

I hope this will help you! I'm gonna update the code so that it just automatically sets use_background_gaussians to False if no background point is detected during the optimization.

Can I ask you how your scene looks like? I suppose it is a small indoor scene, or maybe a synthetic scene?

Anttwo commented 2 months ago

I just updated the code. If you pull the new changes, the code will automatically set use_background_gaussians to False if no background Gaussian is detected/needed for the scene.

shivam-spyne commented 2 months ago

@Anttwo thanks but now i am getting Traceback (most recent call last): File ".../exp/Frosting/train.py", line 236, in frosting_path = refined_training(frosting_args) File "../exp/Frosting/frosting_trainers/refine.py", line 511, in refined_training loss = loss_fn(pred_rgb, gt_rgb) File "../exp/Frosting/frosting_trainers/refine.py", line 409, in loss_fn return (1.0 - dssim_factor) l1_loss(pred_rgb, gt_rgb) + dssim_factor (1.0 - ssim(pred_rgb, gt_rgb)) File "../exp/Frosting/frosting_utils/loss_utils.py", line 38, in ssim window = window.cuda(img1.get_device()) RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

my scene is a centralised scene of a car i suppose that there are torch memory leaks in the current code

Anttwo commented 2 months ago

Alright, let's investigate this!

May I ask you more details about the arguments you used for running the script? Also, at which training iteration are you meeting this error? Is your dataset a COLMAP dataset? What GPU are you using?

Thanks a lot for your time!

I've just run the code on my side on both a real capture and a synthetic scene from Shelly, and everything seems to work as intended. I've also tested the code on a selection of common datasets as well as on custom scenes before, so there shouldn't be an obvious memory leak in the code.

Anttwo commented 2 months ago

I just thought about something: Do your images share the same image-size and camera intrinsics, especially focal length? In the current code we make the same assumption as many common datasets: We assume that all images in the scene are captured with the same intrinsics and image size.

If you have some variations in the intrinsics, that might be the reason to your problem. I can try change the code in the near future to make it work with variable intrinsics during training though, as in practice we actually don't need the intrinsics to be the same.

shivam-spyne commented 2 months ago

(i really appreciate your active replies) GPU: rtx 4090 dataset is blender dataset format(baiscally trainsform_train.json and stuff)

EXACT ERROR TRACE:
Iteration: 3800
loss: 0.125970  [ 3800/15000] computed in 0.16685567696889242 minutes.
> _shell_base_verts: Requires grad: False
> _shell_base_faces: Requires grad: False
> _outer_dist: Requires grad: False
> _inner_dist: Requires grad: False
> _point_cell_indices: Requires grad: False
> _bary_coords: Requires grad: True
     > Min:-18.573694229125977      > Max:2.718003988265991      > Mean:-2.2910563945770264      > 
Std:1.2386972904205322
> _scales: Requires grad: True
     > Min:-9.222304344177246      > Max:2.041729688644409      > Mean:-4.911504745483398      > Std:0.6106228828430176
> _quaternions: Requires grad: True
     > Min:-0.7977717518806458      > Max:1.561862826347351      > Mean:0.2320365607738495      > Std:0.411573201417923
> _opacities: Requires grad: True
     > Min:-7.313634872436523      > Max:16.54620933532715      > Mean:-4.249987602233887      > Std:2.366835355758667
> _sh_coordinates_dc: Requires grad: True
     > Min:-2.4510891437530518      > Max:4.27876091003418      > Mean:-1.1747887134552002      > Std:0.8531623482704163
> _sh_coordinates_rest: Requires grad: True
     > Min:-0.5850973129272461      > Max:0.5158538222312927      > Mean:0.0011372555745765567      > 
Std:0.03744157403707504
raceback (most recent call last):
File ".../exp/Frosting/train.py", line 236, in
frosting_path = refined_training(frosting_args)
File "../exp/Frosting/frosting_trainers/refine.py", line 511, in refined_training
loss = loss_fn(pred_rgb, gt_rgb)
File "../exp/Frosting/frosting_trainers/refine.py", line 409, in loss_fn
return (1.0 - dssim_factor) * l1_loss(pred_rgb, gt_rgb) + dssim_factor * (1.0 - ssim(pred_rgb, gt_rgb))
File "../exp/Frosting/frosting_utils/loss_utils.py", line 38, in ssim
window = window.cuda(img1.get_device())
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

i am also trying to debug it on my end also testing it with other dataset, to check if the problem persists

Anttwo commented 2 months ago

I see, this might indeed be related to the dataset. I did experiment with the Blender dataset for writing the paper, but I did some changes to the code since, and I focused on COLMAP real scenes and Shelly as they better illustrate the strengths of the model.

I'm also running the script on a Blender dataset scene in parallel, and I'll tell you as soon as I have some news.

Thanks a lot for your time!

shivam-spyne commented 2 months ago

yep it worked, memory leak issue was with the dataset itself, thanks for fast responses

Anttwo commented 2 months ago

Great! Thanks, no problem!

Concerning the Blender dataset, I think I have found the problem. Indeed, the extracted mesh is almost empty, which causes the few Gaussians in the mesh to get really big to fit the training images as much as possible, and produces the error.

The almost empty mesh is due to the cleaning of the mesh, which seems to be too aggressive for the Blender synthetic scenes. Indeed, after extracting the mesh, the code cleans it by removing vertices with low confidence as well as vertices that might be located inside the geometry or outside the camera range.

The two parameters that control this sanitization of the mesh are cleaning_quantile and connected_components_vis_th. For synthetic scenes like the Shelly dataset, I recommended using higher cleaning parameters because since the dataset is synthetic and perfectly calibrated, you can usually be pretty harsh in the cleaning and it still is OK. It works very well on Shelly.

However, for the Blender dataset, Gaussian Splatting likes to reconstruct the white background as a solid, white surface. This perturbates the cleaning and enforces it to be too harsh, especially with the new regularization method dn_consistency. Therefore, I think you can just use the default hyperparameters (or even 0.) for cleaning the mesh even if it's a synthetic scene.

Were you using the synthetic datasets recommended parameters, which are --cleaning_quantile 0. and --connected_components_vis_th 0.5? You can retry on the Blender dataset and just use the default parameters recommended for real scenes for example, or even use --cleaning_quantile 0. and --connected_components_vis_th 0. to be sure. You also need to use --white_background True. I just tried it, and it outputs a mesh like this:

colored

The mesh is pretty good, but we can see that the background is reconstructed as a white surface. If you remove it by hand, you can see that the geometry is pretty nice:

mesh

This is the problem with optimizing 3DGS on background-segmented scene hehe I should add an option to just remove Gaussians located inside the backgorund mask, so that they don't perturbate mesh extraction.

shivam-spyne commented 2 months ago

my splat file is correct but my mesh file is bad i m getting these as the outputs ⬆️⬆️

command: python train_full_pipeline.py -s ../3d_shoot_grey --gaussians_in_frosting 2_000_000 -r "dn_consistency" --use_occlusion_culling False --export_obj True --white_background True

using --cleaning_quantile 0. and --connected_components_vis_th 0.5 will remove too much of the portion from the mesh, in a way my mesh turns out almost empty

shivam-spyne commented 2 months ago

update: now i am able to do the required thing, had to replace my input images with 4channel transparent removebg images i guess we can safely close the thread @Anttwo thanks for the great work

Anttwo / Frosting

RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument. #1