autonomousvision / sdfstudio

A Unified Framework for Surface Reconstruction
Apache License 2.0
1.9k stars 179 forks source link

How to improve the training efficiency from wandb #264

Open Hmartin1978 opened 6 months ago

Hmartin1978 commented 6 months ago

Thank you for the amazing works! I have some questions about the training efficency(A 3090 with 24GB VRAM). When I use bakedsdf to train my dataset, I usually increase the train-num-rays-per-batch as much as possible to approach the 24GB VRAM, and I found that increase this will also increase the train iter time. When the training is done, I checked the wandb and have some questions(100 4K images with bakedsdf, training time about 17h) 1 2 3 4 5

(1) You can see there has been some sudden drops in many graphs, for a ideal training it should be a constant line right? (2) Increase the rays-per-batch will increase the GPU memory Allocated(%), the sudden drop means the GPU is accessing the huge data from CPU? Also, the GPU Time Spent Accessing Memory(%) should be as low as possible, i.e. GPU should spend more time in processing data not accessing data? But for my training it reaches to nearly 80%, and this is the drop reason of GPU Utilization? So how to decrease the GPU Time Spent Accessing Memroy? (3) I dont know these factors has any effect to training: Network Traffic/Disk I/O Utilization/Disk Utilization/Process CPU Threads(for my PC is 16)/Process Memory/System Memory. I fount that the CPU Utilization(%) is very low(about 10%), is it normal? (4) More rays training in per batch the result is more good, is this right?

If you have some good experience or advice, could you give me some guidance, thank you very much!