Question about batchsize

hqy117 commented 6 months ago

Hello, excellent work. I run the training, it works well. But why it takes so much memory? It takes about 20GB when I set batch_size to 2. Seems that the parameters of the model are not that much. The model is really that large?

dcharatan commented 6 months ago

The model itself only has ~100M parameters, but training requires more memory than you would expect because the results of F.grid_sample (used for the epipolar transformer) take up a lot of memory.

I tried to solve this problem at one point by writing CUDA code that fuses grid sampling with the next operation in the epipolar transformer. However, while this cuts memory usage for the epipolar transformer by a lot (90% or more), it slows down training too much because you have to run grid sampling twice per attention layer.

hqy117 commented 6 months ago

Thank you!

dcharatan / pixelsplat

Question about batchsize #50