Questions about dataset preprocessing and train configuration for unbalanced classes

FSet89 commented 1 month ago

I am training a custom dataset using the Pointcept codebase. My dataset consists of a large point cloud of a building facade spanning ~10 meters in width and height. I divided it into two files train.txt and val.txt including coordinates, colors, normals and label. The total amount of points is ~8M. I have a few questions.

Given the scale of my point cloud, should I tune some parameter related to the model?
Should I normalize the colors between 0 and 1?
I didn't understand how the trainer processes the input shape. It seems that the model is fed with inputs of size 200K. Should I divide the point clouds into multiple files?
How should I change the configuration file if I want to ignore one or more classes?
The dataset is unbalanced. Does the code manage this? Should I assign different weights to the classes? If so, how?

Gofinge commented 1 month ago

Given the scale of my point cloud, should I tune some parameter related to the model

10M^2 is not large compared with the outdoor scene. Just adjust the grid size (in both augmentations and model).

Should I normalize the colors between 0 and 1?

We have a unified strategy to normalize color; please refer to "NormalizeColor" in the data.transform. Just make sure your color range is [0, 255]. But this is important only when you want to engage multiple datasets for training.

I didn't understand how the trainer processes the input shape. It seems that the model is fed with inputs of size 200K. Should I divide the point clouds into multiple files?

Currently, we don't need to try to adjust the grid size, and if we fail to train the model efficiently, we can further chunk the data.

How should I change the configuration file if I want to ignore one or more classes?

In this case, you need to rewrite the dataset to map the classes you want to ignore to -1.

The dataset is unbalanced. Does the code manage this? Should I assign different weights to the classes? If so, how?

Lovasz loss, Weighted CE (provide a weight), CAC Segmentor.

FSet89 commented 1 month ago

Hi, thank you for your detailed reply.

Lovasz loss, Weighted CE (provide a weight), CAC Segmentor.

Should I implement the Weighted CE? As far the CAC, can I just put it instead of the default segmentor, or should I change some parameter?

Gofinge commented 1 month ago

I think initially default is good enough as a baseline.

Pointcept / PointTransformerV3

Questions about dataset preprocessing and train configuration for unbalanced classes #48