giotto-ai / giotto-tda

A high-performance topological machine learning toolbox in Python
https://giotto-ai.github.io/gtda-docs
Other
847 stars 173 forks source link

[FEATURE REQUEST] GPU-support with CuPy #679

Open jakorostami opened 1 year ago

jakorostami commented 1 year ago

Hi!

I've been working with this package and found a bottleneck (atleast for my PC) when working with high-dimensional matrices. There is no GPU support from what I can see. It is relying on joblib (sklearn feature) for speeding up computation on the CPU.

For instance, taking a high-dimensional matrix of shape 3000x112000 (yes, 112 000) with the _time_delay_embedding with CuPy delay=1, dimensions=3, and stride=1 takes about 1-1.5 seconds to compute and return whilst NumPy takes around 10-11 seconds.

I looked into the source code and saw that everything can be replaced with CuPy for the features that are defined by giotto-tdasimply by replacing with import cupy as cp and replacing all np. with cp. For inherited codebase like sklearn, then sklearn inherited code has to be changed as well to run on CuPy.

This would help to run TakensEmbedding. CollectionTransformer(PCA()), VietorisRipsPersistence, PersistenceEntropy in a pipeline.

I believe this would put giotto-tda at the forefront when doing Topological Signal Processing for audio data.

Thanks!

PC setup:

i5-9600K 48GB RAM 1TB SSD RTX 3060 12GB RAM

matteocao commented 4 months ago

This would be a very nice feature, but we currently lack personnel to implement such big improvments. Any chance you would like to make a PR with the GPU support @jakorostami ?

jakorostami commented 4 months ago

Thanks @matteocao, I'll have a look at the time required for it and evaluate if it is feasible for me.