Anttwo / SuGaR

[CVPR 2024] Official PyTorch implementation of SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering
https://anttwo.github.io/sugar/
Other
2.27k stars 174 forks source link

how to solve out of memory #127

Closed hanjoonwon closed 8 months ago

hanjoonwon commented 9 months ago

image

  1. python gaussian_splatting/train.py -s gaussian_splatting/owl2 --iterations 30000 -m output/owl --test_iterations -1 --densify_grad_threshold 0.0003 --densify_until_iter 10000 --densify_from_iter 1200

2.python train.py -s gaussian_splatting/owl2 -c owl_van/ -r density I trained almost 3 hour ... but out of memory occurred at the end of 15000 iterations in process 2 :( I am using asus laptop rtx 2080 super 8gb vram

my image data here https://drive.google.com/drive/folders/1lHsiW8MGQcVTrQT9LtJjH8Z6ChMiiVlf?usp=sharing

Anttwo commented 9 months ago

Hello @hanjoonwon,

The code encountered the OOM issue during the texture extraction, after SuGaR optimization. Sorry for that, this is a step that could be done on CPU I think, and it would avoid OOM issues on GPU. I should definitely add that to the code...

8GB is actually pretty low for SuGaR (SuGaR is better tailored for desktop GPUs with more VRAM), but I'm happy to see that you actually managed to run the whole SuGaR pipeline (without the texture extraction) without getting an OOM issue. That's pretty nice!

Here are a few tips to help you:

  1. The training time is long because the default config for the refinement phase is set to "long": This takes approximately 2 hours but returns a super sharp texture. As you can see in the paper, a short refinement phase is actually more than enough to get a good-looking mesh. So you can use the argument --refinement_time short or --refinement_time medium when running train.py to get a much shorter training (with short, refinement will take a few minutes instead of 2 hours).
  2. The simplest solution is to reduce the size of the extracted texture. To do this, you can use the argument --square_size 5 for example. The default value is 10. This is related to the number of pixels used for mapping each triangle into the texture PNG.
  3. If (2) does not work, you can try to run the code with less vertices to have something that does not produce a OOM. Your resulting mesh will have less triangles, but it can still produce very cool results (for example, the red and white knight in the presentation video is reconstructed with only 200k vertices rather than the default 1M). To do this, use the argument --n_vertices_in_mesh 500_000, for example. If you still get an OOM, you can try to reduce it to --n_vertices_in_mesh 250_000.

To wrap it up, you should try the following command to get a shorter training and possibly avoid the OOM issue:

python train.py -s gaussian_splatting/owl2 -c owl_van/ -r density --square_size 5

If this does not work, you can also try the following:

python train.py -s gaussian_splatting/owl2 -c owl_van/ -r density --square_size 5 --n_vertices_in_mesh 500_000

Decreasing the value of --n_vertices_in_mesh will reduce the resolution of the mesh, but could solve your OOM issue. You can try to further reduce it to --n_vertices_in_mesh 250_000 if you still get an OOM issue.

Looking forward to your answer!

hanjoonwon commented 9 months ago

@Anttwo Nice work and nice guy:) I really appreciate your detailed and kind response I'll try it and share the results.

hanjoonwon commented 9 months ago

Hello @hanjoonwon,

The code encountered the OOM issue during the texture extraction, after SuGaR optimization. Sorry for that, this is a step that could be done on CPU I think, and it would avoid OOM issues on GPU. I should definitely add that to the code...

8GB is actually pretty low for SuGaR (SuGaR is better tailored for desktop GPUs with more VRAM), but I'm happy to see that you actually managed to run the whole SuGaR pipeline (without the texture extraction) without getting an OOM issue. That's pretty nice!

Here are a few tips to help you:

  1. The training time is long because the default config for the refinement phase is set to "long": This takes approximately 2 hours but returns a super sharp texture. As you can see in the paper, a short refinement phase is actually more than enough to get a good-looking mesh. So you can use the argument --refinement_time short or --refinement_time medium when running train.py to get a much shorter training (with short, refinement will take a few minutes instead of 2 hours).
  2. The simplest solution is to reduce the size of the extracted texture. To do this, you can use the argument --square_size 5 for example. The default value is 10. This is related to the number of pixels used for mapping each triangle into the texture PNG.
  3. If (2) does not work, you can try to run the code with less vertices to have something that does not produce a OOM. Your resulting mesh will have less triangles, but it can still produce very cool results (for example, the red and white knight in the presentation video is reconstructed with only 200k vertices rather than the default 1M). To do this, use the argument --n_vertices_in_mesh 500_000, for example. If you still get an OOM, you can try to reduce it to --n_vertices_in_mesh 250_000.

To wrap it up, you should try the following command to get a shorter training and possibly avoid the OOM issue:

python train.py -s gaussian_splatting/owl2 -c owl_van/ -r density --square_size 5

If this does not work, you can also try the following:

python train.py -s gaussian_splatting/owl2 -c owl_van/ -r density --square_size 5 --n_vertices_in_mesh 500_000

Decreasing the value of --n_vertices_in_mesh will reduce the resolution of the mesh, but could solve your OOM issue. You can try to further reduce it to --n_vertices_in_mesh 250_000 if you still get an OOM issue.

Looking forward to your answer!

@Anttwo I tried running it with the first command you suggested, and I don't have the out of memory problem, but the results are very strange, both textured mesh obj and mesh ply. What could be the problem? image set works well in nerfstudio nerfacto, but similarly bad result was obtained with sdfstudio image

Anttwo commented 9 months ago

Hello @hanjoonwon,

Sure, I have two questions:

  1. Could you show me one or two of your input images? Just to know what does your scene look like.
  2. Could you try to zoom in? Concerning the quality of the reconstruction, it might just be due to a camera placed too far away from the center of the scene. Indeed, the background geometry is sometimes very far away from the main foreground object, and the scene may look like a mess until you zoom in and see your object in the foreground.

Here is an example illustrating point (2) with the bicycle scene: When you load the mesh, the camera is very far away and it looks messy. 291847879-4b152a2f-c37e-48a9-96fc-6bab2f7e5af2 But when you zoom in, the mesh is actually pretty good: bicycle_mesh

hanjoonwon commented 9 months ago

hi @Anttwo
image image

thank you! I zoomed in and out and found it!! Thank you so much! I think it's because my background is too messy or density option I want my mesh to be as close as possible to the dimensions of the actual object when it is printed on a 3d printer. If I want to get a more accurate and smooth mesh, can I post-process it with a tool like meshlab?

I'm afraid I'm bothering you with a lot of questions. ++I was able to default the vertices for this image, but the new image has OOM issues even with 500000. Is it a difference between datasets?

lolidrk commented 1 week ago

hi @Anttwo image image

thank you! I zoomed in and out and found it!! Thank you so much! I think it's because my background is too messy or density option I want my mesh to be as close as possible to the dimensions of the actual object when it is printed on a 3d printer. If I want to get a more accurate and smooth mesh, can I post-process it with a tool like meshlab?

I'm afraid I'm bothering you with a lot of questions. ++I was able to default the vertices for this image, but the new image has OOM issues even with 500000. Is it a difference between datasets?

Hi @hanjoonwon ,

I saw that you were able to get everything running at the end. I’m working with an RTX 2080 Super (8GB RAM) as well, so I thought our setups might be quite similar.

I wanted to ask if you used nvdiffrast in your implementation? If so, could you please share how you went about installing it? Specifically, did you use Docker, or follow the installation process as described in the paper?

Additionally, could you also provide the details of your conda environment—like the Python version and the output you get when you do conda list—and whether you created it using the environment.yml file or installed the packages manually?

I’m really struggling with similar issues, and any help would be greatly appreciated.

Thanks a lot!

hanjoonwon commented 1 week ago

hi @Anttwo image image thank you! I zoomed in and out and found it!! Thank you so much! I think it's because my background is too messy or density option I want my mesh to be as close as possible to the dimensions of the actual object when it is printed on a 3d printer. If I want to get a more accurate and smooth mesh, can I post-process it with a tool like meshlab? I'm afraid I'm bothering you with a lot of questions. ++I was able to default the vertices for this image, but the new image has OOM issues even with 500000. Is it a difference between datasets?

Hi @hanjoonwon ,

I saw that you were able to get everything running at the end. I’m working with an RTX 2080 Super (8GB RAM) as well, so I thought our setups might be quite similar.

I wanted to ask if you used nvdiffrast in your implementation? If so, could you please share how you went about installing it? Specifically, did you use Docker, or follow the installation process as described in the paper?

Additionally, could you also provide the details of your conda environment—like the Python version and the output you get when you do conda list—and whether you created it using the environment.yml file or installed the packages manually?

I’m really struggling with similar issues, and any help would be greatly appreciated.

Thanks a lot!

I can't remember it because it was a long time ago, but I installed it the way it was in readme. I used the environment with uuntu anaconda and not docker.

I used Python 3.9 and referred yml file

If you have a little error, go to the gs3d original paper and refer to it