filaPro / oneformer3d

[CVPR2024] OneFormer3D: One Transformer for Unified Point Cloud Segmentation
Other
351 stars 32 forks source link

OOM for S3DIS #11

Closed sonukiller closed 10 months ago

sonukiller commented 10 months ago

While training S3DIS using pre-trained backbone with 24 GB GPU, I am getting OOM after 5-8 epochs. I have reduced the model.test_cfg.topk_insts from 450 to 200 and also model.test_cfg.inst_score_thr to 0.05 to avoid OOM, but still not much use.

I have also tried multi-gpu training (2 x 24 GB), but then I am getting OOM after 14-16 epochs.

Can you please help in this?

filaPro commented 10 months ago

model.test_cfg parameters can reduce only RAM memory usage. We never tried to run S3DIS training with less than 32 Gb GPU memory. You can probably try to reduce num_points parameter in PointSample_ in config.

sonukiller commented 10 months ago

I tried reducing the num_points parameter in PointSample_ to 150K from 180K, then I can run till 30 epochs and again OOM happens. I am using 2x24 GB GPU. I also want to know will this change in the parameter impact the model performance?

filaPro commented 10 months ago

The impact should be very limited, e.g. 1-2% in terms of mAP.

sonukiller commented 10 months ago

Ok, thanks. To avoid OOM, I kept everything the same and reduced the batch_size to 1 (from 2) and num_workers to 1 (from 3). Also, I reduced the initial learning rate to 1/root(2). After training for 512 epochs, both the miou and mAP are 6% less than expected. Can you comment on this?

filaPro commented 10 months ago

We never tried training with batch size less than 2.

sonukiller commented 10 months ago

Ok thanks! I am able to train by reducing the num_points parameter in PointSample_

iamthephd commented 8 months ago

Can you please tell the significance of num_points, what does it represent?

oneformer3d-contributor commented 8 months ago

The number of point sampled from the input point cloud. Less points tends to less memory and a little bit worse metrics.