RuntimeError: numel: integer multiplication overflow

fkcptlst commented 8 months ago

I ran into the following exception when I made some modifications on learning rates.

 9 Training progress:  21%|▍ | 1030/5000 [15:14<1:43:53,  1.57s/it, Loss=0.9026756]Error executing job with overrides: ['+wandb_key=xxx']
10 Traceback (most recent call last):
11   File "/LucidDreamer/train.py", line 622, in main
12     training(lp, op, pp, gcp, gp, hg_params, cfg.test_iterations, cfg.save_iterations, cfg.checkpoint_iterations,
13   File "/LucidDreamer/train.py", line 349, in training
14     render_pkg = render(viewpoint_cam, gaussians, pipe, background,
15   File "/LucidDreamer/gaussian_renderer/__init__.py", line 146, in render
16     rendered_image, radii, depth_alpha = rasterizer(
17   File "/LucidDreamer/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
18     return forward_call(*args, **kwargs)
19   File "/LucidDreamer/venv/lib/python3.10/site-packages/diff_gaussian_rasterization/__init__.py", line 186, in forward
20     return rasterize_gaussians(
21   File "/LucidDreamer/venv/lib/python3.10/site-packages/diff_gaussian_rasterization/__init__.py", line 28, in rasterize_gaussians
22     return _RasterizeGaussians.apply(
23   File "/LucidDreamer/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
24     return super().apply(*args, **kwargs)  # type: ignore[misc]
25   File "/LucidDreamer/venv/lib/python3.10/site-packages/diff_gaussian_rasterization/__init__.py", line 78, in forward
26     num_rendered, color, depth, radii, geomBuffer, binningBuffer, imgBuffer = _C.rasterize_gaussians(*args)
27 RuntimeError: numel: integer multiplication overflow

Specifically, the optimization params that I use are as follows:

as_latent_ratio: 0.2
densification_interval: 100
densify_from_iter: 100
densify_grad_threshold: 0.00075
densify_until_iter: 3000
feature_lr: 0.01
feature_lr_final: 0.0005
fovy_scale_up_factor:
- 0.75
- 1.1
geo_iter: 0
iterations: 5000
lambda_scale: 0.0
lambda_tv: 0.0
opacity_lr: 0.01
opacity_reset_interval: 300
percent_dense: 0.003
phi_scale_up_factor: 1.5
position_lr_delay_mult: 0.01
position_lr_final: 1.6e-06
position_lr_init: 0.00016
position_lr_max_steps: 30000
pro_frames_num: 600
pro_render_45: false
progressive_view_init_ratio: 0.2
progressive_view_iter: 500
rotation_lr: 0.01
rotation_lr_final: 0.0005
save_process: true
scale_up_cameras_iter: 500
scale_up_factor: 0.95
scaling_lr: 0.01
scaling_lr_final: 0.0005
use_control_net_iter: 10000000
use_progressive: false
warmup_iter: 1500

I've also checked this issue from the original gaussian-splatting repo with little help: https://github.com/graphdeco-inria/gaussian-splatting/issues/24

I wonder if similar issues were encountered before, and what are the possible methods to mitigate this issue?

fkcptlst commented 8 months ago

I noticed the authors of gaussian-splatting made an overflow bug fix: https://github.com/graphdeco-inria/diff-gaussian-rasterization/commit/f6f13c689327d0ad7fe716f98f5d81f313e11ff6

But this implementation changed it back in this commit. I wonder what are the rationales behind this?

YixunLiang commented 8 months ago

I noticed the authors of gaussian-splatting made an overflow bug fix: graphdeco-inria/diff-gaussian-rasterization@f6f13c6

But this implementation changed it back in this commit. I wonder what are the rationales behind this?

Thanks for pointing it out. This project began shortly after GS was open-sourced. At that time, I implemented the submodule using an earlier version before the GS developers had addressed the proposed bug. Luckily, I didn't experience this bug. Thus, I did not fix it. The reason it appears that I reverted the bug is that I just forked the latest version from November and pasted the submodule we had been using, and pushed it. We plan to fix it with a more robust version when we find the time. However, we welcome contributions and would greatly appreciate it if you would consider submitting a pull request to fix this bug or help enhance this project.

fkcptlst commented 8 months ago

Thanks for the clarification. I'll try patching the fix and see if it works.

EnVision-Research / LucidDreamer

RuntimeError: numel: integer multiplication overflow #26