PointCloudLibrary / pcl

Point Cloud Library (PCL)
https://pointclouds.org/
Other
10.03k stars 4.62k forks source link

[custom] Possibilitiy of adding fast KDtree (or other clustering algorithm) to the GPU module? #4817

Open FabianSchuetze opened 3 years ago

FabianSchuetze commented 3 years ago

In thread #4677, @larshg mentioned that the revised GPU clustering runs faster than before but is slower than the CPU version. The CPU version is based on a KD-Tree, while the GPU version relies on an Octree. I was looking for fast GPU implementation of KD-Tree but did not found a convincing one. Moreover, I was not sure if something like this exists. I thus wanted to ask if somebody with more experience with the clustering algorithms knows whether fast GPU implementations exist and whether we could leverage them here? Are maybe the nn implementations in the cuda module just that? I am grateful for any tips or suggestions!

mvieth commented 3 years ago

I recently found http://ann-benchmarks.com, which however mainly tests CPU implementations, but the readme on Github hints toward https://github.com/facebookresearch/faiss, which has a GPU implementation. I haven't looked into that at all so no idea if that is useful for us, but it may be a starting point for you if you are interested in this.

FabianSchuetze commented 3 years ago

Ha! Fantastic, thank you, Markus! I was hoping to get exactly such an answer and I will definitely look into faiss.

FabianSchuetze commented 3 years ago

I have research this topic more. To my despair faiss does not support a radiusSearch on the gpu, only on the cpu. To quote:

[...]and range search is not currently implemented on the GPU.

It is on my long-term roadmap to allow for k-selection for arbitrary k on the GPU, but this will take a while and isn't something I can promise anytime soon. Range search would deal with similar issues, though this one is easier.

I begin to wonder if GPUs are not suitable for such searches. The faiss GPU module has a search for the k-nearest neigbors, but I am not sure if this could be of any help for us?

BaltashovIlia commented 3 years ago

Hi,

Have you seen clustering from autoware? It is faster than PCL CPU/GPU clustering with the same result. https://github.com/Autoware-AI/core_perception/blob/master/lidar_euclidean_cluster_detect/nodes/lidar_euclidean_cluster_detect/gpu_euclidean_clustering.cu

I wrote a small benchmark: https://github.com/BaltashovIlia/pcl_vs_autoware_clustering

On Ryzen 2700x + RTX A4000, the result is as follows: bunny.pcd (397) rops_cloud.pcd (32087) sdc_filtered.pcd (41916) sdc_raw.pcd (200499)
pcl_cpu 1.67 10595 170 1313
pcl_gpu 3.20 4176 639 3224
autoware 1.24 12.3 16.4 175
yasamoka commented 2 years ago

5299