Anttwo / SuGaR

[CVPR 2024] Official PyTorch implementation of SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering
https://anttwo.github.io/sugar/
Other
2.1k stars 150 forks source link

Question about coarse and refined mesh #89

Closed Ryan-ZL-Lin closed 6 months ago

Ryan-ZL-Lin commented 8 months ago

Hi First of all, thanks for your amazing work here, I'm really excited to see what quality I could achieve with my RTX4080 12GB machine.

I used my own dataset and run the following command to generate PLY and OBJ python train.py -s /mnt/c/Users/lzlal/gaussian_splatting/data/BLK3_Room_1S_8_R2/ -c /mnt/c/Users/lzlal/gaussian_splatting/output/BLK3_Room_1S_8_R2/ -r sdf -t True -i 30000 --high_poly True --refinement_time long --export_ply True --postprocess_mesh True

However, I got an error at during the post processing phase, here is the log. Question 1 : Is it the reason that I don't have any OBJ file in folder refined_mesh ? because the OBJ creation process is terminated due to OOM issue

Training finished after 15000 iterations with loss=0.05230744183063507.
Saving final model...
Final model saved.

Exporting ply file with refined Gaussians...
Ply file exported. This file is needed for using the dedicated viewer.
==================================================
Starting extracting texture from refined SuGaR model:
Scene path: /mnt/c/Users/lzlal/gaussian_splatting/data/BLK3_Room_1S_8_R2/
Iteration to load: 30000
Vanilla 3DGS checkpoint path: /mnt/c/Users/lzlal/gaussian_splatting/output/BLK3_Room_1S_8_R2/
Refined model path:
./output/refined/BLK3_Room_1S_8_R2/sugarfine_3Dgs30000_sdfestim02_sdfnorm02_level03_decim1000000_normalconsistency01_gau
ssperface1/15000.pt
Coarse mesh path:
./output/coarse_mesh/BLK3_Room_1S_8_R2/sugarmesh_3Dgs30000_sdfestim02_sdfnorm02_level03_decim1000000.ply
Mesh output directory: ./output/refined_mesh/BLK3_Room_1S_8_R2
Mesh save path:
./output/refined_mesh/BLK3_Room_1S_8_R2/sugarfine_3Dgs30000_sdfestim02_sdfnorm02_level03_decim1000000_normalconsistency0
1_gaussperface1_postprocessed.obj
Number of gaussians per surface triangle: 1
Square size: 10
Postprocess mesh: True
==================================================
Source path: /mnt/c/Users/lzlal/gaussian_splatting/data/BLK3_Room_1S_8_R2/
Gaussian splatting checkpoint path: /mnt/c/Users/lzlal/gaussian_splatting/output/BLK3_Room_1S_8_R2/

Loading Vanilla 3DGS model config /mnt/c/Users/lzlal/gaussian_splatting/output/BLK3_Room_1S_8_R2/...
Found image extension .jpg
Vanilla 3DGS Loaded.
850 training images detected.
The model has been trained for 30000 steps.
0.727799 M gaussians detected.
Binding radiance cloud to surface mesh...
Postprocessing mesh by removing border triangles with low-opacity gaussians...

Starting postprocessing iteration 0

Starting postprocessing iteration 1

Starting postprocessing iteration 2

Starting postprocessing iteration 3

Starting postprocessing iteration 4
Variable knn_to_track not found. Setting it to 16.
Binding radiance cloud to surface mesh...
Mesh postprocessed.
Traceback (most recent call last):
  File "/mnt/c/Users/lzlal/SuGaR/train.py", line 189, in <module>
    refined_mesh_path = extract_mesh_and_texture_from_refined_sugar(refined_mesh_args)
  File "/mnt/c/Users/lzlal/SuGaR/sugar_extractors/refined_mesh.py", line 191, in extract_mesh_and_texture_from_refined_sugar
    verts_uv, faces_uv, texture_img = extract_texture_image_and_uv_from_gaussians(
  File "/mnt/c/Users/lzlal/SuGaR/sugar_scene/sugar_model.py", line 2436, in extract_texture_image_and_uv_from_gaussians
    pixels_space_positions = (all_triangle_bary_coords[..., None] * faces_verts[:, None]).sum(dim=-2)[:, :, None]
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 958.00 MiB (GPU 0; 11.99 GiB total capacity; 10.51 GiB already allocated; 0 bytes free; 10.93 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_spli

As a result, I have sugarmesh_3Dgs30000_sdfestim02_sdfnorm02_level03_decim1000000.ply in folder coarse_mesh, which can be visualized like this in MeshLab (Vertices: 1,041,230 Faces: 1,999,816)

image

I though the PLY file in folder refiend_ply would have a better quality, and here is the visualization in MeshLab (Vertices: 1,999,816 Faces: 0)

image

Question 2 : The number of faces in coarse_mesh PLY (75 MB) is actually equal to the number of vertices in refined_ply PLY (472 MB), is it normal?

Question 3: Is the refined_ply PLY generated correctly? I thought it should look very similar to the coarse_mesh PLY but with better quality??

Anttwo commented 8 months ago

Hello @Ryan-ZL-Lin,

Thank you for your really nice words!

Question 1: Exactly, you don't have an OBJ file because you got an OOM error during the texture extraction for the refined mesh. Let me explain.

This "texture extraction" part is actually pretty simple in theory, since the mesh was already refined during the refinement phase; This final "texture extraction" phase just aims to compute a traditional UV texture for the refined mesh, as it provides a good visualization tool for traditional softwares like Blender or MeshLab. So nothing too complicated, this process just takes a few minutes and could be done on CPU.

However, the current implementation does it on GPU by default. Consequently, because the texture image is very large, it can produce OOM issues (which you encountered). You can try to provide a lower --square_size argument to train.py to fix this problem (the default value is 10, so you can try 8 or 5 for example); the texture would be less good-looking but it will need less memory. I should definitely add a CPU option for this step, as it can be performed on the CPU and this would avoid some dumb OOM issues. This problem is entirely my fault, I'm really sorry for that.

Question 2+3:

The refined_ply file is actually not a mesh file, but a file that contains all the parameters of the Gaussians in the hybrid representation (as well as the vertices of the mesh, of course). This file has the exact same format as the PLY files from the original Gaussian Splatting implementation, and is supposed to be used by the viewer to display the Gaussians is the hybrid representation "Mesh + Gaussians". That's why you can't visualize surfaces when you load this file in meshlab. The textured refined mesh file is actually supposed to be the OBJ file, which has a corresponding PNG texture image. Unfortunately, your code encountered an OOM issue, so you don't have it yet.

I'm so sorry for this! This is a consequence of the 12GB of your GPU; I agree that it's a good amount of memory in practice, but just like the original implementation of Gaussian Splatting (that requires up to 20+GB), 12GB can sometimes be a little too low for SuGaR. You need to slightly tune the hyperparameters (like --square_size) to make it work on your GPU.

Some papers that study "compression" for Gaussian Splatting came out recently, I'm sure that this could help SuGaR to run on GPUs with much less memory.

Ryan-ZL-Lin commented 7 months ago

@Anttwo Thanks a lot, by using --square_size to 5, I'm able to create OBJ like the picture below, I notice that there are lots of broken fragments on the floor and walls,

Q1: is it due to the new introduced parameter --square_size ? (p.s. I'm using 360 camera to take the video and slice the frame into different perspective images as the source images to go through GS pipeline)

Q2: BTW, what's the meaning of --square_size? I cannot find it on README.

image

and...the number of vertices and faces in the refined mesh (OBJ, 1,042,019 and 1,855,030) and the coarse mesh (PLY, 1,042,019, 1,999,751) are quite similar, Q3: so...I'd like to know what's the right way to compare the quality between coarse mesh and refined mesh?

I will use the new Viewer to see the Refined PLY once I overcome the installation issues.

kitmallet commented 7 months ago

@Ryan-ZL-Lin I have found that using 360 cameras has worked for me but it didn’t create a very good result. The split images don’t overlap which I found was the issue. I wonder if there is a way to use ffmpeg to have it create overlapping photos? One way would be to do two versions of pictures. The second version you would have to rotate the video by 36 degrees to make the other photos have a 50% overlap. If that makes sense? Best, Kit

Anttwo commented 7 months ago

Hello @Ryan-ZL-Lin,

You're welcome!

Concerning 360 cameras, I have not tried it myself, so I can't really help you with that, I'm sorry. We use the same assumptions than the original 3D Gaussian Splatting implementation in our code, i.e. a simple pinhole camera model (and we ask COLMAP to undistort the images to fit this model). Converting your data into a simple pinhole camera model may be slightly more challenging than we might think? I don't really know how well COLMAP can convert 360 camera data into the format we need.

Here are some answers to your questions.

Question 1: Indeed, you shouldn't have so much broken fragments in your scene, especially on the ground. I've tried SuGaR with the default settings on more than a dozen custom scenes captured with smartphones (some of them being quite old), and I've never observed so much artifacts. So good news is, I think it should be possible to get a good reconstruction with your scene. Let's investigate why you got so much broken fragments! Here are some ideas:

  1. I think your issue is probably due to using -i 30_000, i.e. using the 30,000th iteration Gaussian Splatting model for applying SuGaR. By default, as we explain in the paper, we use the 7000th iteration model as a base for SuGaR, as it works well for extracting surfaces with the Poisson algorithm. Let me explain: The more the initial model has Gaussians, the smaller they are and the more they reproduce small texture details in the scene. In practice, we don't need the initial Gaussian Splatting model to have a lot of Gaussians, because we do not need the coarse model to have a lot of details (especially for texture). We just need the coarse model to have more or less accurate geometry/positions for Gaussians (details will be captured with more precision during the refinement anyway). The reason why using smaller Gaussians at the start could produce broken fragment is the following: After the coarse training, we apply the Poisson reconstruction algorithm (with a depth parameter equal to 10) as we noticed it produces good-looking surfaces with a typical 7K Gaussian Splatting model. If you use much smaller Gaussians (just like in 30K Gaussian Splatting models), the Poisson reconstruction with depth=10 might not be fine enough to capture all the surface elements, and might produce holes and broken fragments in the surface (especially for parts of the surface that we do not see a lot in training images, just like the floor). Therefore, you might try to use the default setting -i 7000 to see if it produces better results. Increasing the depth in Poisson might also be a solution, but this will make the approach slower and much more memory-intensive.
  2. The postprocess_mesh strategy that we use aims to remove ''border'' artifacts that are sometimes produced by the Poisson reconstruction algorithm. We give more details in the README.md file (see section Tips for using SuGaR on your own data and obtain better reconstructions, subsection 1. Capture images or videos that cover the entire surface of the scene). In short, sometimes, when a small or thin object is not covered entirely by the training images (for example, if your video covers only one side of the object), the Poisson reconstruction might add some unnecessary and innacurate triangles at the border between the visible and the non-visible parts of the surface. The postprocess_mesh algorithm tries to identify and remove from the refined mesh these border triangles that should not be there. In your case, because you already have a lot of holes in your mesh (probably because of point 1.), the postprocess_mesh algorithm seems to just make the broken fragments worse. Looking at your scene, you can try to remove postprocess_mesh True for now as it might not be needed, and retry with it later.
  3. May I ask, did you use the script convert.py to generate camera pose data from your images? Who knows, these broken fragments could also be due to using a 360 camera, as the COLMAP script could have trouble with it?

Question 2: We actually provide the full list of all arguments for train.py in the README.md file, but I agree it's quite hidden. Please click on the dropdown list located below the list of the most important arguments, and you'll see the following definition for --square_size: sugar_refinement_args I admit, this definition is not really helpful hehe! Basically, --square_size is related to the number of pixels used to map a triangle of the mesh in the texture image. So the higher square_size, the higher the resolution (and the size) of the texture. Because we optimize the texture on GPU, you can have OOM issues if square_size is too high.

Yes, the coarse mesh and refined mesh have a similar number of triangles because the refinement process actually does not change the number of triangles. The refinement strategy (a) moves the vertices to slightly refine the geometry of the scene, and (b) covers the mesh with a higher number of Gaussians to produce a Gaussian Splatting representation aligned with the surface and with more details than the coarse model.

Question 3: To compare the coarse and refined meshes, you can first compare the .PLY file of the coarse mesh (which has coarse vertex colors) to the .OBJ file of the refined mesh (which has a texture .PNG file, and refined geometry). Then, you can also compare the coarse mesh to the hybrid representation, which consists in the refined mesh covered with 3D Gaussians. These Gaussians can be found in the refined .PLY file. You can use the viewer we recently added to the code, as it lets you visualize the hybrid representation, the refined mesh with a texture, and the refined mesh with wireframe.

I hope my message will be helpful to you!

Ryan-ZL-Lin commented 7 months ago

@Ryan-ZL-Lin I have found that using 360 cameras has worked for me but it didn’t create a very good result. The split images don’t overlap which I found was the issue. I wonder if there is a way to use ffmpeg to have it create overlapping photos? One way would be to do two versions of pictures. The second version you would have to rotate the video by 36 degrees to make the other photos have a 50% overlap. If that makes sense? Best, Kit

Hi @kitmallet Actually, I'm using nerfstudio to split 360 images, see https://docs.nerf.studio/reference/cli/ns_process_data.html#video I guess they are also using ffmpeg to deal with overlapping.

It seems that the OBJ/PLY quality in SuGaR is better than Neuralangelo (https://github.com/NVlabs/neuralangelo) on my local machine. I will try out normal perspective video to see what would be the result.

kitmallet commented 7 months ago

Sounds good @Ryan-ZL-Lin If you want to try the other process I mentioned I can give you details as to how it is done. Kit

Ryan-ZL-Lin commented 7 months ago

Sounds good @Ryan-ZL-Lin If you want to try the other process I mentioned I can give you details as to how it is done. Kit

That would be great !! Perhaps my approach is not well suited to SuGaR.

Ryan-ZL-Lin commented 7 months ago

Hello @Ryan-ZL-Lin,

You're welcome!

Concerning 360 cameras, I have not tried it myself, so I can't really help you with that, I'm sorry. We use the same assumptions than the original 3D Gaussian Splatting implementation in our code, i.e. a simple pinhole camera model (and we ask COLMAP to undistort the images to fit this model). Converting your data into a simple pinhole camera model may be slightly more challenging than we might think? I don't really know how well COLMAP can convert 360 camera data into the format we need.

Here are some answers to your questions.

Question 1: Indeed, you shouldn't have so much broken fragments in your scene, especially on the ground. I've tried SuGaR with the default settings on more than a dozen custom scenes captured with smartphones (some of them being quite old), and I've never observed so much artifacts. So good news is, I think it should be possible to get a good reconstruction with your scene. Let's investigate why you got so much broken fragments! Here are some ideas:

  1. I think your issue is probably due to using -i 30_000, i.e. using the 30,000th iteration Gaussian Splatting model for applying SuGaR. By default, as we explain in the paper, we use the 7000th iteration model as a base for SuGaR, as it works well for extracting surfaces with the Poisson algorithm. Let me explain: The more the initial model has Gaussians, the smaller they are and the more they reproduce small texture details in the scene. In practice, we don't need the initial Gaussian Splatting model to have a lot of Gaussians, because we do not need the coarse model to have a lot of details (especially for texture). We just need the coarse model to have more or less accurate geometry/positions for Gaussians (details will be captured with more precision during the refinement anyway). The reason why using smaller Gaussians at the start could produce broken fragment is the following: After the coarse training, we apply the Poisson reconstruction algorithm (with a depth parameter equal to 10) as we noticed it produces good-looking surfaces with a typical 7K Gaussian Splatting model. If you use much smaller Gaussians (just like in 30K Gaussian Splatting models), the Poisson reconstruction with depth=10 might not be fine enough to capture all the surface elements, and might produce holes and broken fragments in the surface (especially for parts of the surface that we do not see a lot in training images, just like the floor). Therefore, you might try to use the default setting -i 7000 to see if it produces better results. Increasing the depth in Poisson might also be a solution, but this will make the approach slower and much more memory-intensive.
  2. The postprocess_mesh strategy that we use aims to remove ''border'' artifacts that are sometimes produced by the Poisson reconstruction algorithm. We give more details in the README.md file (see section Tips for using SuGaR on your own data and obtain better reconstructions, subsection 1. Capture images or videos that cover the entire surface of the scene). In short, sometimes, when a small or thin object is not covered entirely by the training images (for example, if your video covers only one side of the object), the Poisson reconstruction might add some unnecessary and innacurate triangles at the border between the visible and the non-visible parts of the surface. The postprocess_mesh algorithm tries to identify and remove from the refined mesh these border triangles that should not be there. In your case, because you already have a lot of holes in your mesh (probably because of point 1.), the postprocess_mesh algorithm seems to just make the broken fragments worse. Looking at your scene, you can try to remove postprocess_mesh True for now as it might not be needed, and retry with it later.
  3. May I ask, did you use the script convert.py to generate camera pose data from your images? Who knows, these broken fragments could also be due to using a 360 camera, as the COLMAP script could have trouble with it?

Question 2: We actually provide the full list of all arguments for train.py in the README.md file, but I agree it's quite hidden. Please click on the dropdown list located below the list of the most important arguments, and you'll see the following definition for --square_size: sugar_refinement_args I admit, this definition is not really helpful hehe! Basically, --square_size is related to the number of pixels used to map a triangle of the mesh in the texture image. So the higher square_size, the higher the resolution (and the size) of the texture. Because we optimize the texture on GPU, you can have OOM issues if square_size is too high.

Yes, the coarse mesh and refined mesh have a similar number of triangles because the refinement process actually does not change the number of triangles. The refinement strategy (a) moves the vertices to slightly refine the geometry of the scene, and (b) covers the mesh with a higher number of Gaussians to produce a Gaussian Splatting representation aligned with the surface and with more details than the coarse model.

Question 3: To compare the coarse and refined meshes, you can first compare the .PLY file of the coarse mesh (which has coarse vertex colors) to the .OBJ file of the refined mesh (which has a texture .PNG file, and refined geometry). Then, you can also compare the coarse mesh to the hybrid representation, which consists in the refined mesh covered with 3D Gaussians. These Gaussians can be found in the refined .PLY file. You can use the viewer we recently added to the code, as it lets you visualize the hybrid representation, the refined mesh with a texture, and the refined mesh with wireframe.

I hope my message will be helpful to you!

@Anttwo Thanks, I found the explanation of --square_size on README.

Instead of running the convert.py in your repo to create all the files below, I used Nerfstudio to slice the 360 video into perspective images and went through COLMAP process (90% of the images are aligned within COLMAP), afterwards, I used those sliced images as the data source to go through Gaussian Splatting pipeline by using its original repo and conda environment, which was done 3 months ago. (I can create a Gaussian Splatting scene by using these files) image image

A few days ago, I started to use the same data source under gaussian_splatting/data/<scene_name>, same Gaussian Splatting output under gaussian_splatting/output/<scene_name> to try out your method. I also used images_4 rather than images folder under gaussian_splatting/data/<scene_name> to solve the OOM issue. Previously I tried out the same 360 video with Neuralangelo, but it required DGX A100 to get better quality. To be honest, I can already reach better quality on my local laptop with your approach compared to Neuralangelo by using the same dataset.

I also tried out -i 7000 and remove --postprocess_mesh to see whether the broken fragment issue can be solved, however, the fragments are still there as the picture below: image

Perhaps my current approach with 360 video is really not well suited with SuGaR, I will try out normal perspective video again.

Ryan-ZL-Lin commented 7 months ago

Hi @Anttwo I realized that I was running convert.py in the original Gaussian_Splatting conda environment, not directly in SuGaR conda environment, so I decided to do it again to see whether I could solve the problem (not sure whether it's going to take any difference...).

This time, I got 328 images (1472 x 1472) and placed them in sugar/gaussian_splatting/data/<scene_name> folder, then run convert.py to align the camera poses with 271 images. Afterwards, running sugar/gaussian_splatting/train.py to complete the optimization process.

Before running sugar/train.py , I renamed the folder "image" to "image_BK", and "image_4" to "image" under sugar/gaussian_splatting/data/<scene_name> folder to solve the OOM issue, so that I got all the images in 366 x 366 resolution, otherwise, I won't be able to complete the sugar training process with 12GB GPU RAM.

Next, by running sugar/train.py -s XXX -m XXX -r sdf -t True -i 30000 --high_poly True --refinement_time long --export_ply True --square_size 5 --postprocess_mesh True I got the training process completed. But the broken fragment issue still appear as the image below

image

Question: Instead of reducing --square_sizefurther, is it the right way to use the downscaled image to solve the OOM issue? (for example, rename the folder "image" to "image_BK", and "image_4" to "image") I tried out both 360 video and perspective video with the same process, but the broken fragment issues still appear. So I'm wondering whether it's the root cause of the broken fragments and not sure whether reducing --square_size would reduce the output mesh quality or not...

Anttwo commented 6 months ago

Hey @Ryan-ZL-Lin,

I think I have the solution for the holes in your mesh. I just pushed a small change of the script sugar_extractors/coarse_mesh.py. You can pull this and change the line 43 from vertices_density_quantile = 0.1 to vertices_density_quantile = 0. for example. This will reduce the threshold used for cleaning the poisson mesh. I think this cleaning is what is producing holes in your mesh.

After that, if you also have some weird ellipsoid bumps on your surface, you can also try to reduce the depth of the Poisson reconstruction at line 42 to make them disappear. You could try to change poisson_depth = 10 to poisson_depth = 8 or 7, or even 6 for example.

Ryan-ZL-Lin commented 6 months ago

Thanks @Anttwo With your first recommended approach, I'm able to solve this issue as the image below. Although the quality is slightly sacrificed but it's already good enough on my local 12GB VRAM environment. I would continue to try out poisson_depth to improve the quality.

image