GPU accelerated implementations

jiali1025 commented 10 months ago

Hi bro!

This is a very nice repo. However, I find if there is a need to process large amount of data with large amount of verts per data, it will take extremely long time. I find most of the calculation is based on matrix multiplication, so is it possible to use GPU to accelerate the process?

RobinMagnet commented 10 months ago

Hi, Thanks for the message !

I am indeed working on releasing a pyFM.torch implementation. While some parts are very easy to transcribe, some others require more work. I hope to release it before January. Note however that the bottleneck will often be the sparse eigendecomposition which has to be done on the CPU. Nearest neighbor is then the second bottleneck, matrix multiplication being less of an issue in these cases.

I'm expecting to implement solutions for all these issues in the pyFM.torch repo though (it's already quite good).

jiali1025 commented 10 months ago

Thanks bro! This repo is very nice, especially now that functional map learning is very important.

I think the eigsh can be replaced by TORCH.LINALG.EIGH. https://pytorch.org/docs/stable/generated/torch.linalg.eigh.html#torch-linalg-eigh This operation is on GPU. If you need any help in torch implementation I think I can help as well. My email is jiali25@nus.edu.sg.

RobinMagnet commented 10 months ago

Well this function is fine for dense matrix but really unfit to handle sparse matrices as far as I know. I know there are some sparse cholesky solvers on GPU, but I never found them easy to install on different machines so I'm not very confident I want to make them a dependency.

Thanks for the message also, I'll definitely let you know if I need help !

The thing is I do have a lot of these functions already implemented in different repos, and I mostly need to clean and organize them.

jiali1025 commented 10 months ago

Wow, thanks! You are much more professional than me in this repo. I think the TORCH.SPARSE is still under implementation. Actually in my use case now what I really want to do is to accelerate the TriMesh's LB_decomposition. I have 46000 surfaces with verts ranging from 1500 to 4000. I tried to replace the function with lgl, but find it would be even slower. I think if can use torch with GPU then it will be much faster as I have good GPU resources but not CPU resources. If you have the functions already implemented could u share them with me? In addition, I am thinking about whether to use a batch of surfaces, like making the matrix in batches and passing a batch for the calculation of the mass matrix and stiffness matrix as well as possibly doing the decomposition.

If you need any help just let me know, your repo is so nice!

jiali1025 commented 10 months ago

Hi bro, may I ask you how u deal with the batch mesh calculation? I think we have to create a batch of meshes to actually accelerate via gpus. However, the number of verts and faces is different for different meshes. Even for graph neural networks and so on, they actually are not doing complex matrix operations with a batch of different sizes of matrix. In addition, I think padding doesn't make sense here as it will change the matrix. If you have any good advice please let me know.

Many thanks! Kindly Regards!

jiali1025 commented 10 months ago

I think I solved the gpu implementations, thanks

Dolphin4mi commented 1 month ago

I think I solved the gpu implementations, thanks

Hi bro, thank you for your advice. You said you have solved acceleration on the GPU. May I ask how I can obtain your achievement? Has it been open-sourced? Thank you.

RobinMagnet / pyFM

GPU accelerated implementations #11