erikwijmans / Pointnet2_PyTorch

PyTorch implementation of Pointnet2/Pointnet++
The Unlicense
1.52k stars 342 forks source link

performance degradation after using higher dimensional data #59

Open dandan19793 opened 5 years ago

dandan19793 commented 5 years ago

Hi, im trying to find edges in pointclouds. first i down-sampled my data set to 5k-sized pointclouds and got the following performance:

precision 0.9722454458109377 recall 0.9774251897764888 specifity 0.9976463102707986 f1score 0.9748284372091011

then i tried to down-sample my dataset to 10k pointclouds and was surprised that the performance decreased immensely: precision 0.6414813848515061 recall 0.4429522551934075 specifity 0.9649306344666823 f1score 0.5240442855918716

I have to add that in the 10k dataset i kept all my edge vertices. That means that instead of having one edge vertice out of every 30 vertices i have one edge vertice out of every 10 vertices. (not sure numbers are 100% correct but main thing is - used to have very very few edge vertices and now i have slightly more of them in the samples i feed the algorithm with)

Any idea what can cause the performance degradation? i would have expected that with more data and with more positive vertices the algorithm would learn better.

If i am not mistaken, all I had to change in the code was the "num_points" in the train function and load the 10k instead of the 5 k data.

Any help would be much appreciated

erikwijmans commented 5 years ago

You seem to be running into a common issue in ML -- imbalanced classes: https://towardsdatascience.com/handling-imbalanced-datasets-in-machine-learning-7a0e84220f28

A common way to deal with this in DL is loss-weighting.