dcharatan / pixelsplat

[CVPR 2024 Oral, Best Paper Runner-Up] Code for "pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction" by David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann
http://davidcharatan.com/pixelsplat/
MIT License
830 stars 56 forks source link

Question about batchsize #50

Closed hqy117 closed 6 months ago

hqy117 commented 6 months ago

Hello, excellent work. I run the training, it works well. But why it takes so much memory? It takes about 20GB when I set batch_size to 2. Seems that the parameters of the model are not that much. The model is really that large?

dcharatan commented 6 months ago

The model itself only has ~100M parameters, but training requires more memory than you would expect because the results of F.grid_sample (used for the epipolar transformer) take up a lot of memory.

I tried to solve this problem at one point by writing CUDA code that fuses grid sampling with the next operation in the epipolar transformer. However, while this cuts memory usage for the epipolar transformer by a lot (90% or more), it slows down training too much because you have to run grid sampling twice per attention layer.

hqy117 commented 6 months ago

Thank you!