facebookresearch / ContrastiveSceneContexts

Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"
MIT License
224 stars 28 forks source link

S3DIS Semantic Segmentation Training from Scratch #41

Closed YilmazKadir closed 2 years ago

YilmazKadir commented 2 years ago

In both PointContrast and ContrastiveSceneContexts papers, semantic segmentation results on S3DIS are stated as 68.2 mIoU. But in MinkowskiNet's GitHub repository(https://github.com/chrischoy/SpatioTemporalSegmentation), they achieve 66.3 mIoU using Mink16UNet34. You are also using Res16UNet34C with 5cm voxel size. When I train the model using my own repository with Res16UNet34C I also get around 66.4 mIoU. Is there anything I am missing? Can you explain how you get +2 mIoU compared to the original Minkowski model? Is it about data augmentation, optimizer etc. ?

Sekunde commented 2 years ago

Hi, there,

Could be the training recipe: have you tried to train on 8 GPUs with a total batch size of 48?

YilmazKadir commented 2 years ago

No I did not as I do not have access to 8 GPUs. In the PointContrast paper it is mentioned that replacing the backbone with SR-UNet is the reason why you get improvement from 65.4 to 68.2. SR-UNet is Res16UNet34C right? image But in the Minkowski repository model zoo they also use Res16UNet34C and achieve 66.4 mIoU. So, I thought it is not only the change of backbone that makes you achieve 68.2 mIoU. I will try larger batch size. If you have any suggestions I would appreciate it.

Sekunde commented 2 years ago

Hi, I am not exactly sure whether it is because of the architecture or batch size; as I trained the S3DIS semantic segmentation from scratch once with the PointContrast code base last year and at that time I used 8 GPUs, and since you mentioned that the architecture is the same, I assume it is probably due to the training recipe.

BTW: are you using ME04 or ME05? It shouldn't matter when training from scratch, but just in case.

YilmazKadir commented 2 years ago

Hi, thank you for the explanation. I am using ME 0.5.4.

Sekunde commented 2 years ago

I c, please notice the number reported in the PointContrast paper actually uses ME0.3; but I tried semantic segmentation in S3DIS in ME0.4 last year. Have not tried ME05 though.