Closed YilmazKadir closed 2 years ago
Hi, there,
Could be the training recipe: have you tried to train on 8 GPUs with a total batch size of 48?
No I did not as I do not have access to 8 GPUs. In the PointContrast paper it is mentioned that replacing the backbone with SR-UNet is the reason why you get improvement from 65.4 to 68.2. SR-UNet is Res16UNet34C right? But in the Minkowski repository model zoo they also use Res16UNet34C and achieve 66.4 mIoU. So, I thought it is not only the change of backbone that makes you achieve 68.2 mIoU. I will try larger batch size. If you have any suggestions I would appreciate it.
Hi, I am not exactly sure whether it is because of the architecture or batch size; as I trained the S3DIS semantic segmentation from scratch once with the PointContrast code base last year and at that time I used 8 GPUs, and since you mentioned that the architecture is the same, I assume it is probably due to the training recipe.
BTW: are you using ME04 or ME05? It shouldn't matter when training from scratch, but just in case.
Hi, thank you for the explanation. I am using ME 0.5.4.
I c, please notice the number reported in the PointContrast paper actually uses ME0.3; but I tried semantic segmentation in S3DIS in ME0.4 last year. Have not tried ME05 though.
In both PointContrast and ContrastiveSceneContexts papers, semantic segmentation results on S3DIS are stated as 68.2 mIoU. But in MinkowskiNet's GitHub repository(https://github.com/chrischoy/SpatioTemporalSegmentation), they achieve 66.3 mIoU using Mink16UNet34. You are also using Res16UNet34C with 5cm voxel size. When I train the model using my own repository with Res16UNet34C I also get around 66.4 mIoU. Is there anything I am missing? Can you explain how you get +2 mIoU compared to the original Minkowski model? Is it about data augmentation, optimizer etc. ?