Pointcept / PointTransformerV2

[NeurIPS'22] An official PyTorch implementation of PTv2.
356 stars 25 forks source link

pytorch version implementation of **pointops.knn_query_and_group** #31

Closed EricLina closed 1 year ago

EricLina commented 1 year ago

The KNN and Group processes are finished by pointops.knn_query_and_group(I guess that maybe faster), however it's hard to understand the implementation of knn and group in cuda version. Is there any pytorch version implementation of pointops.knn_query_and_group?

Gofinge commented 1 year ago

As far as I know, most implementations of KNN query for 3D point could are by CUDA for efficiency, e.g., PyG and our pointops. You can rewrite it by PyTorch, but the forward time should be terrible. I recommend spending a little time learning how the CUDA thread works.

EricLina commented 1 year ago

I've found a torch version in https://github.com/qq456cvb/Point-Transformers/blob/master/pointnet_util.py

I think I need spend some time in learning cuda to understand the implementation of V2.

Anyway, thank you for your reply!

Gofinge commented 1 year ago

If the KNN implementation is also efficient, maybe we can remove the dependency on pointops and make the codebase more easy to track.

EricLina commented 1 year ago

As far as I know, most implementations of KNN query for 3D point could are by CUDA for efficiency, e.g., PyG and our pointops. You can rewrite it by PyTorch, but the forward time should be terrible. I recommend spending a little time learning how the CUDA thread works.

You are right, I tried the torch version code I mentioned just now, the speed and memory cost is unbearable. I replace your pointopt_queryandgroup by the torch implementation. In your code, you directly input 200k points at one time, Is this reasonable? I mean that, is there any other way to split rather than input totally?

Anyway, I followed your input data style. When processing 200k points, the KNN operation produces a huge matrix of [200k, 200k], which needs 1000G memory to save.

I am curious about how did you handle so much points at one time.

Waiting for your reply.

Gofinge commented 1 year ago

Offset make KNN search only operate within each scene. Also, the Cuda implementation only uses GPU cache and cache temp information.