Fix Dataloader related error on Windows

rjanvier commented 8 months ago

What does this PR do?

Windows systems have some restriction of what can be pickled in a multiprocessing context (lambda are not allowed). This PR fix the Dataloader class where a lambda is used and it uses a "top level function" instead.

Before submitting

[X] Did you make sure title is self-explanatory and the description concisely explains the PR?
[X] Did you make sure your PR does only one thing, instead of bundling different changes together?
[X] Did you list all the breaking changes introduced by this pull request?
[X] Did you test your PR locally with pytest command? no but it works (TM)
[ ] Did you run pre-commit hooks with pre-commit run -a command?

Did you have fun?

Facing bugs is not fun, fixing them is a joy.

rjanvier commented 8 months ago

Yes, it lacks at least one more PR (that should come today). I made some disruptive change in a private repo for the need of my team that work on a specific semantic segmentation task so it's hard to sync changes.

I have converted all C/C++ extensions (apart FRNN) to pep 517 compatible packages (but Hugo Raguet never answered to my PR on gitlab so I distribute it under my own PiPY account for now https://pypi.org/user/rjanvier/. Be sure it's not inamical, you can "claim" your pacakge anytime you want)
I plugged scipy CPU Kdtree into the pipeline (on two of my computer it could be faster/on par with FRNN if you take transfer time into account and FRNN is PITA dependency). With PiPY pre compiled packages of pgeof and cut-pursuit, it makes SPT "100% compiler free" and Windows user friendly.
I setup a hatch project for SPT and change the File Hierarchy to match the one of a correct python package (Lightning hydra directory structure is not "package friendly"). With these changes, A simple pip install -e . on SPT root directory works on Windows and Linux but with restriction / subtilities because of the way torch / pygeom are pacakged and the way hatch manage strictly pip indexes.
I manage to wrap inference into a self contained binary with pyinstaller.
I have also speed up the Delaunay triangulation by an order of magnitude (for CPU only scenario)
I plan to introduce a kind of plugin mechanism to create new datasets / task outside the repo easily and to integrate it into CloudCompare

drprojects commented 8 months ago

Wow, that sounds awesome !

I would love to move away from FRNN if you found a faster alternative. But I am surprised you found scipy to be faster. Last time I benchmarked FRNN against alternatives (scipy, FAISS, ...) FRNN was a clear winner. Do you mean scipy CPU is faster than FRNN on GPU ?

Great news if you accelerated the Delaunay triangulation. This was one of the bottlenecks of SPG and I circumvented it with an alternative custom superpoint graph construction in SPT. Yet, if you still want to use Delaunay triangulation, I am curious to see how you accelerated it.

Looking forward to seeing all this.

rjanvier commented 8 months ago

FRNN is the winner in term of NN search by far but since you transfer the results afterward it's not that good. But it could depends on the architecture of the computer (RAM speed, BUS etc.), the GPU. I have two computer with Pascal generation GPU. For example for a ~2M point cloud, 50NN with a 1m "security radius" FRNN 0.1s / next DataTo 6s SciPy KDTree 3.4s / next DataTo 1s Maybe there is something to change in the DataTo step that follow the NN search or maybe it's an artifact on my configurations.

To speed up Delaunay I made a binding around https://github.com/BrunoLevy/geogram.psm.Delaunay. You can find it here. It's a breeze Vs qhull.

drprojects commented 8 months ago

For FRNN, I assume your points represent a real scene at a 1-10 cm voxel resolution (as opposed to say a random point sampling in $[0, 1]^3$) ? Did you make sure to call torch.cuda.synchronize() before each timestep measurement ? I am very skeptical about the 6 second time transfer.

import torch
from time import time

torch.cuda.synchronize()
start = time()

# do your thing

torch.cuda.synchronize()
print(f"Elapsed time: {time() - start} seconds")

Good job for Delaunay, I tried to have a look at your repo but I think it is private ;)

rjanvier commented 8 months ago

Yes sry it is private for now (but will be public soon). For FRNN I monitor the time for each Transform in differents configuration (pure CPU, CPU KNN + GPU, pure GPU with FRNN) so it gives me the overall performances of these configurations for the SPT pre transform pipeline (so transfer cost included). CPU KNN + GPU is the fastest on my gears. But it's not significant and could vary depending on the computer. But it shows CPU KNN is viable inside SPT.

drprojects commented 8 months ago

Could you please share the code you used for benchmarking these ? I really doubt FRNN on GPU is beaten by a CPU-based alternative, even counting the CPU-GPU transfer time. Would like to test this on my machines.

rjanvier commented 8 months ago

Sure, I will try to do this week else it will be after the 07/01.

drprojects / superpoint_transformer