Parameters to change for bigger datasets

arijitde92 commented 5 months ago

Hi @HuguesTHOMAS ,

I am trying to run semantic segmentation using your code on a custom dataset which is similar to S3DIS but has fewer number of classes.

But the dimensions of the areas are much bigger than that of S3DIS data. Please see the below table for the x (length), y(width) and z (height) values for the S3DIS areas and those of my custom dataset.

S3DIS Area	X	Y	Z
1	24.6	48.273	5.288
2	30.89	51.277	7.555
3	29.258	25.72	3.137
4	48.318	26.635	8.813
5	66.4	45.039	4.676
6	23.179	45.36	3.867
Mean	37.108	40.384	5.556
Median	30.074	45.2	4.982
Std. Dev	16.94	11.24	2.2

Custom Dataset Area	X	Y	Z
1	70.9251	151.912	43.3071
2	229.56	220.518	44.8819
3	117	209	37.5

Mean	139.16	193.81	41.9
Median	117	209	43.31
Std. Dev	66.63	30	3.17

Also, please see below how the area looks within the same dimension boundaries- S3DIS area 4 looks like (having length - 47.3, width - 26.6 and height = 8.8)

A part of the point cloud of my custom dataset enclosed in similar volume (similar height, width and depth) Here, Yellow = Floor, Grey = Wall, Red = Beam.

I think increasing the in_radius parameter would help to accomodate more point clouds that can help learn the structural features of my bigger dataaset. Can you help me with the below doubts?

Is there any other parameter that needs to be tweaked to train the KPConv model on my dataset?
If the dimensions of the areas fed into the model during training vary by a large amount, would this affect the training performance, or does the areas need to be of similar dimensions?

HuguesTHOMAS commented 5 months ago

Hi @arijitde92,

When I see your area it seems the structural patterns are very large and you don't have many fine details. In that case, I would use a larger subsampling size, and a larger input radius (keep the ratio stable otherwise too large in radius will create OOM issue as too many points are fed to the network at once)

The dimension of the area will not affect the training performances. You can try different combination of subsampling_dl and in_radius to see what is best for your data.

arijitde92 commented 5 months ago

Hi @arijitde92,

When I see your area it seems the structural patterns are very large and you don't have many fine details. In that case, I would use a larger subsampling size, and a larger input radius (keep the ratio stable otherwise too large in radius will create OOM issue as too many points are fed to the network at once)

The dimension of the area will not affect the training performances. You can try different combination of subsampling_dl and in_radius to see what is best for your data.

Thanks @HuguesTHOMAS for your swift reply.

HuguesTHOMAS / KPConv-PyTorch

Parameters to change for bigger datasets #247