drprojects / superpoint_transformer

Official PyTorch implementation of Superpoint Transformer introduced in [ICCV'23] "Efficient 3D Semantic Segmentation with Superpoint Transformer" and SuperCluster introduced in [3DV'24 Oral] "Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering"
MIT License
545 stars 71 forks source link

Fix Dataloader related error on Windows #52

Closed rjanvier closed 8 months ago

rjanvier commented 8 months ago

What does this PR do?

Windows systems have some restriction of what can be pickled in a multiprocessing context (lambda are not allowed). This PR fix the Dataloader class where a lambda is used and it uses a "top level function" instead.

Before submitting

Did you have fun?

Facing bugs is not fun, fixing them is a joy.

rjanvier commented 8 months ago

Yes, it lacks at least one more PR (that should come today). I made some disruptive change in a private repo for the need of my team that work on a specific semantic segmentation task so it's hard to sync changes.

drprojects commented 8 months ago

Wow, that sounds awesome !

I would love to move away from FRNN if you found a faster alternative. But I am surprised you found scipy to be faster. Last time I benchmarked FRNN against alternatives (scipy, FAISS, ...) FRNN was a clear winner. Do you mean scipy CPU is faster than FRNN on GPU ?

Great news if you accelerated the Delaunay triangulation. This was one of the bottlenecks of SPG and I circumvented it with an alternative custom superpoint graph construction in SPT. Yet, if you still want to use Delaunay triangulation, I am curious to see how you accelerated it.

Looking forward to seeing all this.

rjanvier commented 8 months ago

FRNN is the winner in term of NN search by far but since you transfer the results afterward it's not that good. But it could depends on the architecture of the computer (RAM speed, BUS etc.), the GPU. I have two computer with Pascal generation GPU. For example for a ~2M point cloud, 50NN with a 1m "security radius" FRNN 0.1s / next DataTo 6s SciPy KDTree 3.4s / next DataTo 1s Maybe there is something to change in the DataTo step that follow the NN search or maybe it's an artifact on my configurations.

To speed up Delaunay I made a binding around https://github.com/BrunoLevy/geogram.psm.Delaunay. You can find it here. It's a breeze Vs qhull.

drprojects commented 8 months ago

For FRNN, I assume your points represent a real scene at a 1-10 cm voxel resolution (as opposed to say a random point sampling in $[0, 1]^3$) ? Did you make sure to call torch.cuda.synchronize() before each timestep measurement ? I am very skeptical about the 6 second time transfer.

import torch
from time import time

torch.cuda.synchronize()
start = time()

# do your thing

torch.cuda.synchronize()
print(f"Elapsed time: {time() - start} seconds")

Good job for Delaunay, I tried to have a look at your repo but I think it is private ;)

rjanvier commented 8 months ago

Yes sry it is private for now (but will be public soon). For FRNN I monitor the time for each Transform in differents configuration (pure CPU, CPU KNN + GPU, pure GPU with FRNN) so it gives me the overall performances of these configurations for the SPT pre transform pipeline (so transfer cost included). CPU KNN + GPU is the fastest on my gears. But it's not significant and could vary depending on the computer. But it shows CPU KNN is viable inside SPT.

drprojects commented 8 months ago

Could you please share the code you used for benchmarking these ? I really doubt FRNN on GPU is beaten by a CPU-based alternative, even counting the CPU-GPU transfer time. Would like to test this on my machines.

rjanvier commented 8 months ago

Sure, I will try to do this week else it will be after the 07/01.