RozDavid / LanguageGroundedSemseg

Implementation for ECCV 2022 paper Language-Grounded Indoor 3D Semantic Segmentation in the Wild
98 stars 14 forks source link

Logs of Language Grounded Pretraining #21

Closed Jeff-LiangF closed 1 year ago

Jeff-LiangF commented 1 year ago

Hi @RozDavid ,

Thanks for your exciting work. It is a milestone paper for open-vocab 3D segmentation. I am now trying to reproduce the results of pretraining stage. I notice that pretraining one epoch takes roughly 12 mins on a 40G-A100 GPU with batch size 2. Do you think it is a reasonable speed? As it might take days to obtain the results, I wonder if you happened to keep the pretraining logs. It would be very helpful to check whether the training goes well.

Btw, for the installation of MinkowskiEngine, you may want to add the openblas dependency. conda install openblas-devel -c anaconda The instruction can be found here.

RozDavid commented 1 year ago

Hey @Jeff-LiangF,

I did most of my experiments on 2 A6000s, where 1 epoch took about 14mins, so I think yours is also a reasonable iteration time. So sadly due to the expensive computations these large backbones require there is not much we can do about it. Naturally, if you can afford more GPUs to train on, or you can limit either the voxel resolution or the model size you can get to way faster iteration times.

Thanks for the tip for the conda env - will update!