datagrok-ai / public

Public package repository for the Datagrok.ai platform
MIT License
42 stars 26 forks source link

#227: Utils: Integrate RAPIDS library to speed up dimensionality reduction. #227

Open nikolay-alemasov opened 2 years ago

nikolay-alemasov commented 2 years ago
nikolay-alemasov commented 2 years ago

@skalkin , could you, please, correct the issue if needed?

skalkin commented 2 years ago
  1. Get an EC2 machine with the GPU and test the UMAP performance against the 40k dataset
  2. If we like what we see, integrate it as a package (we will discuss our options with Alexander and Sofia)
dnillovna commented 6 months ago

This issue has been mirrored in Jira: https://reddata.atlassian.net/browse/GROK-14924

StLeonidas commented 6 months ago

Consider to close

dnillovna commented 2 months ago

This issue has been mirrored in Jira: https://reddata.atlassian.net/browse/GROK-16034

drizhina commented 2 months ago

Update: We have the working code for dimensionality reduction using the RAPIDS library. the constraint is that given library only supports primitive distance functions, such as Euclidean, Minkowsky, Hamming etc. and therefore, is much less useful for diverse data.

The library also supports using precalculated KNN for dimensionality reduction, but as this pre-calculating KNN is very heavy opperation, it can become a serious bottleneck.

We have already sped up the dimensionality reduction using webGPU, which enables all the aforementioned calculations (KNN generation, UMAP) on your gpu. even on integrated graphics card it can be tens of times faster. you can check out the post about it here