RozDavid / LanguageGroundedSemseg

Implementation for ECCV 2022 paper Language-Grounded Indoor 3D Semantic Segmentation in the Wild
98 stars 14 forks source link

Why this codebase require a lot of GPU Memory? #19

Closed mmaaz60 closed 1 year ago

mmaaz60 commented 1 year ago

Hi @RozDavid,

Thank You for the great work. I was wondering why this codebase require significantly large GPU memory? For example sometimes 40GB GPU isn't enough for 2 batch size. Also if I scale the number of GPUs (for example to 4 GPUs) then the memory utilization of the first GPU increases a lot and quickly gives the OOM error. I just came across the paper (https://arxiv.org/pdf/2211.15654.pdf) which seems to be a more complex architecture than this paper and they claimed to use BS of 8 for scannet on a single 40GB GPU.

Do you have any suggestions to improve the memory utilization of the codebase other than cropping the voxels? Thanks

RozDavid commented 1 year ago

Hey @mmaaz60,

So the GPU memory requirement depends on mostly two factors. First, on voxel resolution, where even for sparse voxel space and surface meshes the memory requirement is quadratic. Second, on the models parameter number. I haven't checked their codebase, but skimming through the implementation part they used a significantly smaller backbone which is probably the reason for being able to use larger batch size. It is definitely possible to use the same backbone with this project too, for this you just have to pick a model of your preference from here and either modify the last layer to be aligned with the CLIP output dimensions, or project the CLIP features directly.

I ran most of my experiments on A6000s, which is 48Gb and didn't have any problems with the batch size of 2. The thing is, it is hard to determine the batch size with shuffled datasets as there is a high variation of scene sizes, so and sampling 2 large scenes together could be significantly more challenging then two small or average.

But if you have problems with it I would suggest the following things:

Let me know if this helps and all pull requests are welcome :)

Cheers, David