Closed rjanvier closed 7 months ago
Hi @rjanvier
Thanks for the great work ! I have a few comments to make before accepting this PR. These are mostly documentation-related and not game changers for the code.
pgeof2
namingThis is not a big deal, but I would rather keep pgeof
as the name rather than pgeof2
if possible, and use a new versioning major to indicate changes to the library instead.
Would this cause too many issues with the way you built the tests ?
Overall, it would be good to have a bit more documentation for the library:
ideally, we would create a readthedocs, but I do not have the skills nor the time for that at the moment. So I think the README should cover most of the documentation, especially since the library is quite small and has only a handful of functions. Python users with C++ aversion will not go to the C++ files to check the comments. The API for the python functions should then be explicitly presented in the README (as is the case currently for pgeof.pgeof
)
documentation for the pgeof.pgeof
core function, similar to what is already in the repo
documentation (examples + docstring or comments) for pgeof.radius_search
and pgeof.knn_search
as standalone features, to better showcase these new functionalities. Right now, they do appear on the README, but are not strongly put forward
documentation for the conversion to CSR for the new custom pgeof.radius_search
and pgeof.knn_search
(as you do) but also for the current examples using sklearn
too, since people might be more familiar with the output format of these neighbor searches
in particular, for illustrating how to build the CSR from the output of pgeof.radius_search
, I would prefer a one-liner, vectorized, numpy-friendly syntax similar to what I provided for sklearn
's radius_neighbors
in the current version of the repo. If I am not mistaken, the ouptut of pgeof.radius_search
is a 2D array with -1
for missing neighbors. If so, the following should work:
# Generate a random synthetic point cloud and k-nearest neighbors
num_points = 10000
radius = 0.1
xyz = np.random.rand(num_points, 3).astype("float32")
knn, _ = pgeof2.radius_search(xyz, xyz, radius, 50)
# Converting radius neighbors to CSR format
nn_ptr = np.r_[0, (knn >= 0).sum(axis=1).cumsum()]
nn = knn[knn >= 0]
print(help(pgeof))
still works, I would show this trick in the README to invite users to check the documentationFRNN
in SPTI understand the interest of packaging K-NN and radius-NN functionalities along with this library. I am fine with making pgeof
offer this functionality as an add-on for CPU-only applications, by depending on nanoflann
if you benchmarked it as the best candidate.
Yet, I still hold what I mentioned in previous discussions: the GPU-based FRNN
is still much faster than any CPU-based NN library I have benchmarked against, even including the CPU-GPU transfer time. Here is a little snippet to compare FRNN
with sklearn
. On my machine, FRNN
runs more than 100x faster than sklearn
.
from sklearn.neighbors import NearestNeighbors
from FRNN import frnn
import numpy as np
import torch
from time import time
# Generate a random synthetic point cloud and radius neighbors
num_points = 100000
radius = 0.1
k_max = 50
xyz = np.random.rand(num_points, 3).astype('float32')
# Compute radius neighbors with scikit
start = time()
rneigh = NearestNeighbors(radius=radius).fit(xyz).radius_neighbors(xyz)
print(f"scikit: {time() - start:0.3f}")
# Compute radius neighbors with FRNN (including the time for CPU -> GPU copy)
torch.cuda.synchronize()
start = time()
xyz = torch.as_tensor(xyz).cuda()
distances, neighbors, _, _ = frnn.frnn_grid_points(xyz.view(1, -1, 3), xyz.view(1, -1, 3), K=k_max, r=radius)
torch.cuda.synchronize()
print(f"FRNN: {time() - start:0.3f}")
That being said, I agree that the installation of FRNN
is a bit tedious and wouldn't mind moving to another GPU-based solution for SPT in the future (have not investigated it yet).
Maybe a test could be added for pgeof.radius_search
too ?
Hi Damien, thank you very much for your kind and constructive feedback.
I will only answer on the KNN/Radius search thing for now as other points are maybe more consensual.
First, your KNN example is biased at least in two-three aspects
sklearn
here (nb_workers
parameters)sklearn
(and in general CPU KNN) relies on a kd-tree, an acceleration data structure which can’t be amortized with small amounts of data (in your sample maybe a CPU brute force search could be faster). In real life scenari it should be faster.That being said, I agree that FRNN is faster in 100% of cases (maybe like ~10/~20x), but that’s not my main point. What I tried to show in this specific thread is a transfer cost (I haven’t investigated yet, I must admit) that penalizes the whole FRNN process in SPT and make CPU KNN competitive overall. This specific transfer cost is not directly related to the KNN search itself, but the following AddTo
transform that is way faster when the data is already on the CPU.
Maybe this transfer cost could be fixed and maybe it’s related to the specific configuration of the computer on which this test was launched… (I will try to reproduce it in another computer soon).
But don’t get me wrong, I’m mainly interested in a fast and simple CPU KNN alternative because one of the biggest strengths of SPT vs. other DL methods is its low resource consumption so it’s a very good candidate for a CPU only inference process (even CPU only training in some very specific cases).
For example people could have a lot more of RAM than VRAM. Of course SPT allows to work in a tiling fashion but running SPT data preparation on CPU could allow to work on bigger clouds on some computers. Another example is that torch
with cuda is around 2+ Go dependencies and torch
without is something more like 100-300Mo. CPU KNN lowers the install barrier of SPT and allows to embed it more easily (for example as a CloudCompare plugin). It also lowers the install barrier, because as you said FRNN could be difficult to install (for GPU alternative maybe look at KNN on torch geometric)… Without taking into account that a lot of people do not have access to a Nvidia GPU.
This CPU KNN and some fast Delaunay bindings (yet to be released) and you could have a data preparation for SPT that would be almost on par on CPU and GPU. As a practitioner, I think it’s very interesting and it can have a slight but maybe non negligible impact on the dissemination of your method.
Thanks again for this review; I already added the test you requested and I will correct the CSR sample.
I agree to come back to the pgeof
name, but we could not benchmark and test anymore vs. previous versions of pgeof
because we can’t have a two versions of the same package in the same PYTHONPATH
. It would imply to launch test in two different environments and it wouldn't be convenient to compare the outcome of both versions. But in another hand, I think we have no regression vs. original version of pgeof
so test are not mandatory anymore. I'm open to both solutions, please tell me what you prefer: pgeof
name with no regression test or pgeof2
(or something else) with regression tests. Version number is already increased (0.1.0).
I would like to create a RTD
but I think I do not have the time either. This PR is a part of a larger puzzle on our side and I think we could improve a bit the API in the long (after some usage) run so I don’t want to commit too much on the documentation for now. Also I’m waiting for nanobind
version 2 for better stub generations. I will definitely end up to set up the RTD
but maybe in few months, if it’s ok.
For now, I think if we detail the API extensively in the Readme it would clutter it a lot.
Of course, we can list the different functions with small description of what they do in the Readme. But for extensive documentation help(pgeof2)
is already working (modulo some typos I assume :)) and for example of usage we can redirect people to tests/benchmarks files (it’s already done in the PR)
What do you think?
Hi @rjanvier thanks for the good work and fast reply ! Below are my replies to your successive points.
I totally agree that a pure-CPU preprocessing would be great, even if at the cost of a small increase in preprocessing time. And even better, a pure-CPU SPT would simply be amazing ! I am happy to see that this may be on your personal roadmap, judging from your expertise with CPU code optimization.
Still, since K-NN and radius-NN are often bottlenecks in point cloud processing pipelines (along with voxelization and sampling), I am very curious to see which library is faster. I will try to benchmark nanoflann
vs FRNN
on my end at least. But this should not stop this PR, I think it is great to offer CPU-based neighbor search in pgeof
with nanoflann
.
I am fine with keeping pgeof
and not testing regression against the previous version. I did not see any dramatic changes to the core algorithm, I trust that you kept the PCA features computation similar, so we should be good.
Agreed. Let's keep it to a minimum on the README. But still, would be good to have:
help(pgeof)
for more detailsAlso, I come back on what I said about documenting CSR conversion from scikit-learn
neighbor searches. Let's remove those from the README and favor the documentation of your nanoflann
-based utilities. Several reasons for that:
nanoflann
seems faster than scikit-learn
so we want to encourage users to use our implementationscikit-learn
in the dependenciesDoes that sound reasonable to you ?
Thanks Damien, seems all ok on my side. I will try to make those changes as soon as possible.
Awesome, I appreciate this very efficient collaboration !
Hi @drprojects, I think this is ready for another round of review.
Looks good, thanks for these last changes and congrats for this new version !
Thanks. we have to adapt SPT according to the API changes. Should I make a PR?
Should I make a PR?
I would greatly appreciate it.
This PR is a proposal. Some changes made to
pgeof
were needed for some usage that goes beyond the scope of the original implementation. We found them potentially useful for the user community ofSPT/PGEOF
so we wanted to share them and propose to integrate them to the upstream repo. If it seems too much to review / accept insidepgeof
codebase we agree to maintain these changes as public and friendly fork.Infrastructure / Architecture changes
Eigen
is now embedded as a submodule. It’s header only so it’s not a big deal. It simplifies a lot the workflow (see #11 ).Taskflow
(as a submodule too) for parallel computation. It may look like an overkill for simpleparallel_for
. We initially use it for features that finally did not make it in the final release (will be part of another lib that has nothing to deal with feature computation). In another hand,TaskFlow
is cross platform (openMP
is very good but unfortunately it has some quirks on bothmacOS
andWindows
), header only (vsTBB
for example, it has no static/dynamic lib) and it has a clean API. It has high performances so we kept it. If someone wants to rewrite the parallel code with simple stdC++
primitives in order to have something cross platform and without additional dependencies it would be nice.Nanoflann
(as a submodule too) for KNN/radius search. In addition topgeof
"big ndarray of feature" and fast feature computation, we wanted to embed a KNN lib to add an « on the fly » feature computation a lajakteristics
. Nanoflann is a high performance KNN library in C++.Nanoflann
is on par withscikit
KDTree,
or even faster under some circonstances (in fact our internal benchamrks show it’s always faster for our computers and use cases). So we also added two additional functions to compute knn and radius search. These functions could be used, for example, in SPT instead of FRNN for people wanting a fast CPU alternative. As showed in another thread, FRNN is penalized by some -non investigated yet- transfer during SPT transform process so CPU KNN could be overall faster.Nanobind
instead ofC
extension interface from Python. It and has very low overhead. Anyway, overhead of binding libraries should not be an issue with pgeof usage since we have no repeated calls to a function.Nanobind
came with a convenient build system (as it use cmake under the hood) for cross platform C++ modules.pgeof2
(see why in thetests
section)New features and Code Changes:
Eigen
" everywhere.jaktertistics
like on the fly feature computation function (with feature selection)uint32
could be too small for some usage. We do not take advantage of all templates possibilities in the binding yet, for examplepgeof
function is still only available with for float input point cloud, but adouble
overloaded function could be added in the future with only one line of code.Tests
pgeof2
output vs. other lib. It’s mainly why it was renamed topgeof2
: in order to allow to have both versions ofpgeof
under the same path. These tests could have some instabilities (due to precision issues) and there is a known UB in pgeof PyPI package on linux that make the optimal test fail on this platform. Tests were mainly used to guide the implementation and act as a minimal documentation but they could be deactivated from the CI (if we come back to pgeof name we will be forced to)tox run
. This command will setup a dedicated virtual env, install dependencies, build and run testsBenchmarks:
tox -e bench
.Here are some benchmarks results. Running python 3.11 on linux on an intel intel 14700, GCC-13.2 compiler: