pgeof2 - Githubissues

rjanvier commented 7 months ago

This PR is a proposal. Some changes made to pgeof were needed for some usage that goes beyond the scope of the original implementation. We found them potentially useful for the user community of SPT/PGEOF so we wanted to share them and propose to integrate them to the upstream repo. If it seems too much to review / accept inside pgeof codebase we agree to maintain these changes as public and friendly fork.

Infrastructure / Architecture changes

Eigen is now embedded as a submodule. It’s header only so it’s not a big deal. It simplifies a lot the workflow (see #11 ).
Use Taskflow (as a submodule too) for parallel computation. It may look like an overkill for simple parallel_for. We initially use it for features that finally did not make it in the final release (will be part of another lib that has nothing to deal with feature computation). In another hand, TaskFlow is cross platform (openMP is very good but unfortunately it has some quirks on both macOS and Windows), header only (vs TBB for example, it has no static/dynamic lib) and it has a clean API. It has high performances so we kept it. If someone wants to rewrite the parallel code with simple std C++ primitives in order to have something cross platform and without additional dependencies it would be nice.
Use Nanoflann (as a submodule too) for KNN/radius search. In addition to pgeof "big ndarray of feature" and fast feature computation, we wanted to embed a KNN lib to add an « on the fly » feature computation a la jakteristics. Nanoflann is a high performance KNN library in C++. Nanoflannis on par with scikit KDTree, or even faster under some circonstances (in fact our internal benchamrks show it’s always faster for our computers and use cases). So we also added two additional functions to compute knn and radius search. These functions could be used, for example, in SPT instead of FRNN for people wanting a fast CPU alternative. As showed in another thread, FRNN is penalized by some -non investigated yet- transfer during SPT transform process so CPU KNN could be overall faster.
We use Nanobind instead of C extension interface from Python. It and has very low overhead. Anyway, overhead of binding libraries should not be an issue with pgeof usage since we have no repeated calls to a function. Nanobind came with a convenient build system (as it use cmake under the hood) for cross platform C++ modules.
Add Tests and benchmarks (see dedicated sections)
Rename the lib to pgeof2 (see why in the tests section)

New features and Code Changes:

Use "more Eigen" everywhere.
Add some micro optimization in the computation process
The optimal feature computation with eigentropy is now separated from the "classical" feature computation. Having two functions is less DRY but is more readable and allows some micro optimization.
Add a basic multi scale feature computation (should close #7 if this PR is accepted)
Add a jaktertistics like on the fly feature computation function (with feature selection)
Add basic KNN and radius search functions.
Functions are templated to take either double or float types. This could avoid silent copy/conversion in some cases, this allows to have the fastest "0 copy" function for each input. Moreover index type is also templated, because uint32 could be too small for some usage. We do not take advantage of all templates possibilities in the binding yet, for example pgeof function is still only available with for float input point cloud, but a double overloaded function could be added in the future with only one line of code.

Tests

Add pytest tests under tests directory. Tests mainly consist in comparison of pgeof2 output vs. other lib. It’s mainly why it was renamed to pgeof2: in order to allow to have both versions of pgeof under the same path. These tests could have some instabilities (due to precision issues) and there is a known UB in pgeof PyPI package on linux that make the optimal test fail on this platform. Tests were mainly used to guide the implementation and act as a minimal documentation but they could be deactivated from the CI (if we come back to pgeof name we will be forced to)
Test could be run in isolation via tox run. This command will setup a dedicated virtual env, install dependencies, build and run tests

Benchmarks:

synthetic benchmark could be found under the tests folder too, they depends on pytest-benchmark. They could be run via tox -e bench.

Here are some benchmarks results. Running python 3.11 on linux on an intel intel 14700, GCC-13.2 compiler:

drprojects commented 7 months ago

Hi @rjanvier

Thanks for the great work ! I have a few comments to make before accepting this PR. These are mostly documentation-related and not game changers for the code.

About `pgeof2` naming

This is not a big deal, but I would rather keep pgeof as the name rather than pgeof2 if possible, and use a new versioning major to indicate changes to the library instead.

Would this cause too many issues with the way you built the tests ?

About the documentation

Overall, it would be good to have a bit more documentation for the library:

ideally, we would create a readthedocs, but I do not have the skills nor the time for that at the moment. So I think the README should cover most of the documentation, especially since the library is quite small and has only a handful of functions. Python users with C++ aversion will not go to the C++ files to check the comments. The API for the python functions should then be explicitly presented in the README (as is the case currently for pgeof.pgeof)
documentation for the pgeof.pgeof core function, similar to what is already in the repo
documentation (examples + docstring or comments) for pgeof.radius_search and pgeof.knn_search as standalone features, to better showcase these new functionalities. Right now, they do appear on the README, but are not strongly put forward
documentation for the conversion to CSR for the new custom pgeof.radius_search and pgeof.knn_search (as you do) but also for the current examples using sklearn too, since people might be more familiar with the output format of these neighbor searches
in particular, for illustrating how to build the CSR from the output of pgeof.radius_search, I would prefer a one-liner, vectorized, numpy-friendly syntax similar to what I provided for sklearn's radius_neighbors in the current version of the repo. If I am not mistaken, the ouptut of pgeof.radius_search is a 2D array with -1 for missing neighbors. If so, the following should work:

# Generate a random synthetic point cloud and k-nearest neighbors
num_points = 10000
radius = 0.1
xyz = np.random.rand(num_points, 3).astype("float32")
knn, _ = pgeof2.radius_search(xyz, xyz, radius, 50)

# Converting radius neighbors to CSR format
nn_ptr = np.r_[0, (knn >= 0).sum(axis=1).cumsum()]
nn = knn[knn >= 0]

if print(help(pgeof)) still works, I would show this trick in the README to invite users to check the documentation

About `FRNN` in SPT

I understand the interest of packaging K-NN and radius-NN functionalities along with this library. I am fine with making pgeof offer this functionality as an add-on for CPU-only applications, by depending on nanoflann if you benchmarked it as the best candidate.

Yet, I still hold what I mentioned in previous discussions: the GPU-based FRNN is still much faster than any CPU-based NN library I have benchmarked against, even including the CPU-GPU transfer time. Here is a little snippet to compare FRNN with sklearn. On my machine, FRNN runs more than 100x faster than sklearn.

from sklearn.neighbors import NearestNeighbors
from FRNN import frnn
import numpy as np
import torch
from time import time

# Generate a random synthetic point cloud and radius neighbors
num_points = 100000
radius = 0.1
k_max = 50
xyz = np.random.rand(num_points, 3).astype('float32')

# Compute radius neighbors with scikit
start = time()
rneigh = NearestNeighbors(radius=radius).fit(xyz).radius_neighbors(xyz)
print(f"scikit: {time() - start:0.3f}")

# Compute radius neighbors with FRNN (including the time for CPU -> GPU copy)
torch.cuda.synchronize()
start = time()
xyz = torch.as_tensor(xyz).cuda()
distances, neighbors, _, _ = frnn.frnn_grid_points(xyz.view(1, -1, 3), xyz.view(1, -1, 3), K=k_max, r=radius)
torch.cuda.synchronize()
print(f"FRNN: {time() - start:0.3f}")

That being said, I agree that the installation of FRNN is a bit tedious and wouldn't mind moving to another GPU-based solution for SPT in the future (have not investigated it yet).

About tests

Maybe a test could be added for pgeof.radius_search too ?

rjanvier commented 7 months ago

Hi Damien, thank you very much for your kind and constructive feedback.

I will only answer on the KNN/Radius search thing for now as other points are maybe more consensual.

First, your KNN example is biased at least in two-three aspects

You do not use multithreading capabilities of sklearn here (nb_workers parameters)
sklearn (and in general CPU KNN) relies on a kd-tree, an acceleration data structure which can’t be amortized with small amounts of data (in your sample maybe a CPU brute force search could be faster). In real life scenari it should be faster.
The knn/radius search proposed in this PR should be faster.

That being said, I agree that FRNN is faster in 100% of cases (maybe like ~10/~20x), but that’s not my main point. What I tried to show in this specific thread is a transfer cost (I haven’t investigated yet, I must admit) that penalizes the whole FRNN process in SPT and make CPU KNN competitive overall. This specific transfer cost is not directly related to the KNN search itself, but the following AddTo transform that is way faster when the data is already on the CPU.

Maybe this transfer cost could be fixed and maybe it’s related to the specific configuration of the computer on which this test was launched… (I will try to reproduce it in another computer soon).

But don’t get me wrong, I’m mainly interested in a fast and simple CPU KNN alternative because one of the biggest strengths of SPT vs. other DL methods is its low resource consumption so it’s a very good candidate for a CPU only inference process (even CPU only training in some very specific cases).

For example people could have a lot more of RAM than VRAM. Of course SPT allows to work in a tiling fashion but running SPT data preparation on CPU could allow to work on bigger clouds on some computers. Another example is that torch with cuda is around 2+ Go dependencies and torch without is something more like 100-300Mo. CPU KNN lowers the install barrier of SPT and allows to embed it more easily (for example as a CloudCompare plugin). It also lowers the install barrier, because as you said FRNN could be difficult to install (for GPU alternative maybe look at KNN on torch geometric)… Without taking into account that a lot of people do not have access to a Nvidia GPU.

This CPU KNN and some fast Delaunay bindings (yet to be released) and you could have a data preparation for SPT that would be almost on par on CPU and GPU. As a practitioner, I think it’s very interesting and it can have a slight but maybe non negligible impact on the dissemination of your method.

rjanvier commented 7 months ago

Thanks again for this review; I already added the test you requested and I will correct the CSR sample.

About pgeof2 naming

I agree to come back to the pgeofname, but we could not benchmark and test anymore vs. previous versions of pgeof because we can’t have a two versions of the same package in the same PYTHONPATH. It would imply to launch test in two different environments and it wouldn't be convenient to compare the outcome of both versions. But in another hand, I think we have no regression vs. original version of pgeof so test are not mandatory anymore. I'm open to both solutions, please tell me what you prefer: pgeof name with no regression test or pgeof2 (or something else) with regression tests. Version number is already increased (0.1.0).

About the documentation

I would like to create a RTD but I think I do not have the time either. This PR is a part of a larger puzzle on our side and I think we could improve a bit the API in the long (after some usage) run so I don’t want to commit too much on the documentation for now. Also I’m waiting for nanobind version 2 for better stub generations. I will definitely end up to set up the RTD but maybe in few months, if it’s ok.

For now, I think if we detail the API extensively in the Readme it would clutter it a lot. Of course, we can list the different functions with small description of what they do in the Readme. But for extensive documentation help(pgeof2) is already working (modulo some typos I assume :)) and for example of usage we can redirect people to tests/benchmarks files (it’s already done in the PR) What do you think?

drprojects commented 7 months ago

Hi @rjanvier thanks for the good work and fast reply ! Below are my replies to your successive points.

About KNN benchmarking

I totally agree that a pure-CPU preprocessing would be great, even if at the cost of a small increase in preprocessing time. And even better, a pure-CPU SPT would simply be amazing ! I am happy to see that this may be on your personal roadmap, judging from your expertise with CPU code optimization.

Still, since K-NN and radius-NN are often bottlenecks in point cloud processing pipelines (along with voxelization and sampling), I am very curious to see which library is faster. I will try to benchmark nanoflann vs FRNN on my end at least. But this should not stop this PR, I think it is great to offer CPU-based neighbor search in pgeof with nanoflann.

About pgeof2 naming

I am fine with keeping pgeof and not testing regression against the previous version. I did not see any dramatic changes to the core algorithm, I trust that you kept the PCA features computation similar, so we should be good.

About the documentation

Agreed. Let's keep it to a minimum on the README. But still, would be good to have:

list of the functions and a basic description for each
examples of how to use each function (k-nn, radius-nn, and pgeof), with CSR conversion
callout to invite users to help(pgeof) for more details

Also, I come back on what I said about documenting CSR conversion from scikit-learn neighbor searches. Let's remove those from the README and favor the documentation of your nanoflann-based utilities. Several reasons for that:

nanoflann seems faster than scikit-learn so we want to encourage users to use our implementation
we can get rid of scikit-learn in the dependencies
simplifies the README

Does that sound reasonable to you ?

rjanvier commented 7 months ago

Thanks Damien, seems all ok on my side. I will try to make those changes as soon as possible.

drprojects commented 7 months ago

Awesome, I appreciate this very efficient collaboration !

rjanvier commented 7 months ago

Hi @drprojects, I think this is ready for another round of review.

drprojects commented 7 months ago

Looks good, thanks for these last changes and congrats for this new version !

rjanvier commented 7 months ago

Thanks. we have to adapt SPT according to the API changes. Should I make a PR?

drprojects commented 7 months ago

Should I make a PR?

I would greatly appreciate it.

drprojects / point_geometric_features

pgeof2 #12

Infrastructure / Architecture changes

New features and Code Changes:

Tests

Benchmarks:

About `pgeof2` naming

About the documentation

About `FRNN` in SPT

About tests

About pgeof2 naming

About the documentation

About KNN benchmarking

About pgeof2 naming

About the documentation

drprojects / point_geometric_features

pgeof2 #12

Infrastructure / Architecture changes

New features and Code Changes:

Tests

Benchmarks:

About pgeof2 naming

About the documentation

About FRNN in SPT

About tests

About pgeof2 naming

About the documentation

About KNN benchmarking

About pgeof2 naming

About the documentation

About `pgeof2` naming

About `FRNN` in SPT