Anttwo / SuGaR

[CVPR 2024] Official PyTorch implementation of SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering
https://anttwo.github.io/sugar/
Other
2.13k stars 155 forks source link

OOM #45

Open cubantonystark opened 9 months ago

cubantonystark commented 9 months ago

I get OOM in that particular section (texturing) when trying to run --high_poly, I disabled Pytorch CUDA memory caching to see if it would mitigate but it doesn't solve it. I Am running Ubuntu 22.04, RTX 4090 16GB VRAM and 32GB of RAM. That seems to not be enough? Any pointers would be greatly appreciated, Here are the parameters I run the processing with:

Training the Gaussian:

gaussian_splatting$ python train.py -s /home/xxx/SuGaR/xxx/xxxxxxxxxxxx --iterations 7000 -m /home/xxx/SuGaR/xxxxxx/xxxxxxxxxx

Rest of the steps:

python train.py -s /home/xxx/SuGaR/xxxx/xxxxxxxxxxxx -c /home/reyxxxSuGaR/xxxxxx/xxxxxxxxxx -r "sdf" --high_poly True --refinement_time "medium"

Here is the output:

Number of gaussians per surface triangle: 1 Square size: 10 Postprocess mesh: False

Source path: /home/xxx/SuGaR/xx/xxxxxxxxxx/ Gaussian splatting checkpoint path: /home/xx/SuGaR/xxxxxx/xxxxxxxxxx/

Loading Vanilla 3DGS model config /home/xxx/SuGaR/xxxxxx/xxxxxxxxxx/... Found image extension .png Vanilla 3DGS Loaded. 22 training images detected. The model has been trained for 7000 steps. 0.870508 M gaussians detected. Binding radiance cloud to surface mesh... Traceback (most recent call last): File "/home/xx/SuGaR/train.py", line 180, in refined_mesh_path = extract_mesh_and_texture_from_refined_sugar(refined_mesh_args) File "/home/xx/SuGaR/sugar_extractors/refined_mesh.py", line 193, in extract_mesh_and_texture_from_refined_sugar verts_uv, faces_uv, texture_img = extract_texture_image_and_uv_from_gaussians( File "/home/xxx/SuGaR/sugar_scene/sugar_model.py", line 2420, in extract_texture_image_and_uv_from_gaussians texture_img = SH2RGB(texture_img.flip(0)) File "/home/xxx/SuGaR/sugar_utils/spherical_harmonics.py", line 178, in SH2RGB return sh * C0 + 0.5 RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Anttwo commented 9 months ago

Hello @cubantonystark,

Indeed, such GPU memory should be enough, but I think sometimes it just depends on the scene... May I ask how many vertices your coarse mesh has?

You can try to reduce the number of vertices in the coarse mesh. Using --low_poly will drop the number to 200k, but you can adjust the number by yourself (and use, for example, 800k vertices) with the argument --n_vertices_in_mesh. Please refer to the README.md file for more details about the arguments.

I'm looking forward to your reply!

cubantonystark commented 9 months ago

Salut @Anttwo,

Indeed. I hardcoded the values for high_poly to 500k and the OOM went away... I'm testing the limits on the poly count to see what triggers OOM. I have seen the same error using Instant NGP. There might be something we can do about that. Also, is there a way to retarget the decimation to a higher value? The textures I get show vertex seams, which leads me to suspect the texture extraction is happening after the decimation process, not before. May I suggest exploring extracting the texture before the decimation process and then applying decimation?.

See below for an visual of what I'm seeing.

Screenshot 2023-12-28 034316 Screenshot 2023-12-28 034533

As always, thanks for this amazing work.

Anttwo commented 8 months ago

Hello @cubantonystark,

Thank you for your reply, and your images! When you say that you hardcoded the values for high_poly, you mean that you changed it in the python script? I don't know if you've seen it, but we provide additional arguments for train.py detailed in a drop-down list in the README.md file, that let you change the number of vertices, it could be useful for you: mesh_extraction_args

Mmh indeed, looking at your images, it seems that your scene is a large-scale one (therefore, an object as big as a car appears as a very small object in the scene). This kind of scene is not the "typical" usecase for Gaussian Splatting, but it can still be processed... As long as you have a lot of memory 😢 I'm afraid you won't be able to get a detailed reconstruction without increasing the number of vertices (but this will produce a OOM issue).

Still, I may have a solution: Maybe you could just split the scene (just before computing the coarse mesh) into several areas that you refine one by one, with a medium-poly budget (like 500k vertices) for each of them. You can actually provide a foreground bounding box to the mesh extraction script (see the image just above, --bboxmin and bboxmax) so it is possible to split the Gaussian Splatting into several point clouds (delimited by several bounding boxes that tile the volume) and apply both coarse mesh extraction and sugar refinement to each of them.

I may try to implement such "space-tiling" in the future, as it can be useful for large-scale scene. I'm not working a lot right now as we're in the middle of christmas holidays, but I'll try to do that later!

Tao-11-chen commented 7 months ago

Hi, same issue here, can I put train images on CPU until it is being used like Gaussian Splatting does to improve this?