Anttwo / SuGaR

[CVPR 2024] Official PyTorch implementation of SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering
https://anttwo.github.io/sugar/
Other
2.31k stars 175 forks source link

CUDA error: out of memory #58

Open wen-yuan-zhang opened 10 months ago

wen-yuan-zhang commented 10 months ago

Thanks for your excellent work! I met an OOM problem when running

python train.py -s xxx -c xxx

The training process is normal but the mesh cannot be extracted because of OOM problem. I am using a 24GB 3090Ti GPU and I think there is no problem of my GPU. I tried to set image_resolution=4 in gs_model.py but it doesn't help. Could you please give some advice on this problem? Thank you!

_quaternions
torch.Size([920843, 4])
True
_sh_coordinates_dc
torch.Size([920843, 1, 3])
True
_sh_coordinates_rest
torch.Size([920843, 15, 3])
True
Number of gaussians: 920843
Opacities min/max/mean: tensor(7.0727e-05, device='cuda:0') tensor(1., device='cuda:0') tensor(0.6953, device='cuda:0')
Quantile 0.0: 7.072696462273598e-05
Quantile 0.1: 0.027714231982827187
Quantile 0.2: 0.24735727906227112
Quantile 0.3: 0.564430832862854
Quantile 0.4: 0.7604535818099976
Quantile 0.5: 0.8879363536834717
Quantile 0.6: 0.9701117277145386
Quantile 0.7: 0.9962756633758545
Quantile 0.8: 0.999401330947876
Quantile 0.9: 0.9998761415481567

Starting pruning low opacity gaussians...
WARNING! During optimization, you should use a densifier to prune low opacity points.
This function does not preserve the state of an optimizer, and sets requires_grad=False to all parameters.
Number of gaussians left: 666801
Opacities min/max/mean: tensor(0.5000, device='cuda:0') tensor(1., device='cuda:0') tensor(0.9040, device='cuda:0')
Quantile 0.0: 0.5000002980232239
Quantile 0.1: 0.6711668968200684
Quantile 0.2: 0.7912304401397705
Quantile 0.3: 0.8807359933853149
Quantile 0.4: 0.9478757977485657
Quantile 0.5: 0.9858567714691162
Quantile 0.6: 0.9970008730888367
Quantile 0.7: 0.9992061257362366
Quantile 0.8: 0.9997462630271912
Quantile 0.9: 0.9999245405197144
Processing frame 0/161...
Current point cloud for level 0.3 has 0 points.
Traceback (most recent call last):
  File "train.py", line 143, in <module>
    coarse_mesh_path = extract_mesh_from_coarse_sugar(coarse_mesh_args)[0]
  File "/data/zhangwenyuan/nerfgs/SuGaR/sugar_extractors/coarse_mesh.py", line 270, in extract_mesh_from_coarse_sugar
    frame_surface_level_outputs = sugar.compute_level_surface_points_from_camera_fast(
  File "/data/zhangwenyuan/nerfgs/SuGaR/sugar_scene/sugar_model.py", line 1778, in compute_level_surface_points_from_camera_fast
    fragments = rasterizer(mesh, cameras=p3d_cameras)
  File "/home/zhangwenyuan/anaconda3/envs/nerf/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/zhangwenyuan/anaconda3/envs/nerf/lib/python3.8/site-packages/pytorch3d/renderer/mesh/rasterizer.py", line 247, in forward
    pix_to_face, zbuf, bary_coords, dists = rasterize_meshes(
  File "/home/zhangwenyuan/anaconda3/envs/nerf/lib/python3.8/site-packages/pytorch3d/renderer/mesh/rasterize_meshes.py", line 243, in rasterize_meshes
    outputs = convert_clipped_rasterization_to_original_faces(
  File "/home/zhangwenyuan/anaconda3/envs/nerf/lib/python3.8/site-packages/pytorch3d/renderer/mesh/clip.py", line 659, in convert_clipped_rasterization_to_original_faces
    empty = torch.full(pix_to_face_clipped.shape, -1, device=device, dtype=torch.int64)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
AtlasRedux commented 10 months ago

Easy fix, the problem is as simple as that 24GB is not enough in CUDA mode. I've got an RTX 4090 myself. Just set it to use CPU for source image data. It barely affects speed and offloads almost everything into normal RAM. You lose like 5% performance on the optimizer part. With that, you can also skip lowering the resolution. I use full resolution 4K images for source, no resizing, with CPU instead, and it works like a charm. Just add --data_device cpu

wen-yuan-zhang commented 10 months ago

I'll try it. Thank you!

yuedajiong commented 10 months ago

I used TitanXP, only 12G GPU RAM. Simple scene/object(a resin toy), it is OK.

Simple object first?

Iliceth commented 10 months ago

@AtlasRedux Where should this --data_device cpu be added? As it's not an argument for train.py

AtlasRedux commented 10 months ago

@AtlasRedux Where should this --data_device cpu be added? As it's not an argument for train.py

As with all SuGaR arguments, they do the same as the original 3DGS arguments and they go into the gs_model.py. Change self.data_device = "cuda" to self.data_device = "cpu"

Iliceth commented 10 months ago

@AtlasRedux Where should this --data_device cpu be added? As it's not an argument for train.py

As with all SuGaR arguments, they do the same as the original 3DGS arguments and they go into the gs_model.py. Change self.data_device = "cuda" to self.data_device = "cpu"

Thanks!

Iliceth commented 10 months ago

@AtlasRedux I found gs_model.py in the sugar_scene directory and changed cuda to cpu, but I experience the exact same behaviour, so I might be missing something? Or other things that need to be adjusted?

AndreCorreaSantos commented 10 months ago

I get the same problem as @Iliceth, even after adopting the aforementioned suggestions. I also tried reducing the number of refinement iterations and the initial iterations to load, but still to no avail. Any more suggestions would be greatly appreciated.

diegobc11 commented 9 months ago

Same here, I have a 16GB GPU and I usually get CUDA out of memory error, changing data_device from 'cuda' to 'cpu' in gs_model.py does not solve the issue

wen-yuan-zhang commented 5 months ago

Update: I didn't find a useful solution for this, but is seeems that this problem is not stably reproducible. I didn't meet this problem later. Possibly it is affected by the python environment, linux status, unexpected code bugs, or other factors.