Closed Chaoqi-LIU closed 4 months ago
Would this task naturally use multiple cuda kernels, because you don't know in advance how big the output is? Like how knn is easier than radius-sampling because having k upfront lets everything happen in a single kernel.
Maybe using the current function with a large number of points and just take some of the output is as fast as you'll get.
Hi, I wrote my own ad-hoc version with cuda/cpu, and yes you are right, we cannot know in advance the # output points. So I provide one more additional arg as the cap on # output points, and use max_dist < r as the early stop signal, it's pretty fast and results was satisfying.
Thanks anyway!
🚀 Feature
Current FPS only downsample points to a predefined number, or ratio, but it is common that we want to downsample the given points so that the pairwise distance within is ≤ a user given radius.
something like this, but written with cpp and cuda.