HuguesTHOMAS / KPConv-PyTorch

Kernel Point Convolution implemented in PyTorch
MIT License
786 stars 157 forks source link

is voxel grid downsampling necessary #33

Closed zgojcic closed 3 years ago

zgojcic commented 4 years ago

Hi Thomas,

first, thanks for the pytorch implementation.

I would have one question regarding the motivation for the initial downsampling step. Is it mostly performed to keep the size of the point clouds manageable or is kpconv very sensitive to changes in point cloud resolution?

Of course the downsampling does not present any problems when dealing with classification or even segementation tasks where it is easy to interpolate or project the results on the original pc. However, it could present a problem if for example one would like to estimate some vector quantity for each point which might not be trivial to interpolate.

I have seen that in semanticKITTI for example you use 6 cm voxel size. Do you see a large degradation of the performance in parts where the resolution of the original point cloud is actually lower than that?

Thanks,

Zan

HuguesTHOMAS commented 4 years ago

Hi Zan,

Thanks for your interest in my work, these are very interesting questions.

First the initial downsampling step is mostly for controlling the size and scale of the input point cloud. Both first_subsampling_dl and in_radius can be seen as the number of voxels and the dimension of the grid in voxel networks. The ratio between these two parameters cannot be too big to avoid OOM errors. Then when you found a suitable ratio, the actual values are chosen depending on the scale of the details and the objects you want to learn. In my experiments, I often found that using smaller values (and thus finer details) is better, especially indoors, where objects have more fine details.

I also noticed that prediction are robust to lower densities than my subsampling resolution, in particular on Semantic3D dataset, where large parts of the scene have very low resolutions but are still well classified.

As for the performances in higher density areas (as you asked in you last question), the performances are not degraded as long as your subsampling does not miss small details or objects. In the case of SemanticKitti, the smallest objects (or objects with smallest details) are bikes and pedestrians and are rarely located close to the scanner. I believe a smaller subsampling resolution would help have better scores in these classes, but we have to keep in mind that in this dataset, you are meant to segment the whole frame at once. Therefore it is very difficult to reduce the resolution and keep reasonable testing times.

I hope this helps. Best, Hugues