Too Slow training - Githubissues

Anttwo / SuGaR

[CVPR 2024] Official PyTorch implementation of SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering

https://anttwo.github.io/sugar/

Other

2.36k stars 182 forks source link

Too Slow training #31

Open LaFeuilleMorte opened 11 months ago

LaFeuilleMorte commented 11 months ago

Hi, Thanks for your great work and the open source code. I encountered too slow training when doing training with my RTX 3090 machine. And it will take 5~6 minutes to do 50 iterations (8000 in total). And the whole training would take like over 10 hours. That's way longer than what it is in the paper. Am I miss doing something?

Anttwo commented 11 months ago

Hi LaFeuilleMorte,

Indeed, the training time seems very long, 50 iterations should be very short at the beginning of training (0.06 minutes) and get to 0.2 minutes max after starting the surface regularization.

I have several questions for you:

Do you have the laptop or desktop version of the RTX 3090?
How much memory does it have?
How many Gaussians do you have in your initial Gaussian Splatting?

yuedajiong commented 11 months ago

我的屌丝配置计算机： My poor-configuration computer:

(2016，personal GPU workstation） GPU： Titan-XP 12G GPU-memory CPU： 12 cores MEM: 32G DISK: SSD

the training speed is acceptable，15000 iterations, about dozens of minutes.

LaFeuilleMorte commented 11 months ago

Hi, Thanks for the reply Do you have the laptop or desktop version of the RTX 3090? It's a desktop one.

How much memory does it have? 24GB

How many Gaussians do you have in your initial Gaussian Splatting?

LaFeuilleMorte commented 11 months ago

我的屌丝配置计算机： My poor-configuration computer:

(2016，personal GPU workstation） GPU： Titan-XP 12G GPU-memory CPU： 12 cores MEM: 32G DISK: SSD

the training speed is acceptable，15000 iterations, about dozens of minutes.

To my best understanding of the code, it took too much time on the function "coarse_training_with_density_regularization".

Sbector commented 11 months ago

Happy holidays!

Same problem here.

I'm trying with a NVIDIA GeForce GTX 1650: This is the information about my model:

Thanks this incredible work!

DanielChaseButterfield commented 9 months ago

I also have this very same issue.

However, I'm only using a GeForce RTX 2060, and it only has 14 GB of VRAM, so that might be my issue (as opposed to an issue with the repository)

DanielChaseButterfield commented 9 months ago

Looking into this issue a little more, I want to ask: @LaFeuilleMorte, what is your GPU utilization versus GPU memory usage?

When running my model, it seems that almost the entirety of the memory is used, but the GPU itself is doing almost no work at all. I theorize that this could be because the CPU isn't getting information to the GPU fast enough, and so the bottleneck is the CPU.

Looking at the code, it seems that the model is only trained a single image at a time (i.e. the batch size is 1). I wonder if this is why the GPU has nothing to do. I tried changing the following parameter to a larger number of images, but it seems that at some point during development, this value was fixed to 1, as I get the following error if I try to change it.

yuedajiong commented 9 months ago

it looks that the GS does not support batch.