In custom dataset where point number in classes are quite different

Hi @yudopan, thanks for your interest in our project !

There are two ways you can tackle class imbalance: setting weights in your loss to give more importance to rare classes, or sampling your data during training.

For the first option, setting:

weighted_loss: True

in the model config will automatically call the get_class_weight() method of your dataset upon training start. This will compute the frequency of each class in your dataset and will set the loss' weights accordingly. I invite you to have a look at the code for more details, it is fairly commented. Note that weighted_loss=True for all models by default in the provided code.

For the second option, you can play with the by_class=True option of the SampleSegments and SampleRadiusSubgraphs transforms. By default, this behavior is not activated. Here again, I invite you to have a look at the code for these transforms to understand what they do and what the class-weighted sampling will do.

That being said, if one of your class only has 100 points, I fear it may be very hard not to overfit to the training examples. For instance, in KITTI-360 some under-represented classes such at traffic light or motorcycle are very hard to capture despite having multiple instances of hundreds of points in the dataset. I do not know your dataset, but maybe you can find a larger dataset to pretrain on, that contains classes similar to your classes of interest.

Good luck ! Damien

drprojects / superpoint_transformer

In custom dataset where point number in classes are quite different #62