NVIDIA-Genomics-Research / rapids-single-cell-examples

Examples of single-cell genomic analysis accelerated with RAPIDS
Apache License 2.0
318 stars 68 forks source link

[REVIEW] Using multi-gpu PCA, kmeans, and UMAP #83

Closed cjnolet closed 2 years ago

cjnolet commented 2 years ago

I also attempted using multi-gpu NearestNeighbors. While it did work (took only 12s), it still required executing the fuzzy_simplicial_set function from UMAP in order to construct the connectivities graph. For some reason, calling scanpy's function after computing the nearest neighbors caused it to take 1.5 mins instead of the 1m it originally took to compute the neighbors on a single GPU AND call fuzzy_simplicial_set. We should investigate this further, however the amount of data at this stage is so small that I'm seeing more time doing data copies to distribute the work than I am benefit from increased parallelism.

One reason we want to distribute the work end-to-end is so we can process the distributed data in place as much as possible and not have to keep copying back to the client. Leiden/Louvain already have distributed versions in the cuGraph API and they both require distributing the nearest neighbors and fuzzy_simplicial_set computation so that we turn the pca-reduced gene expressions into a connectivities graph.

TLDR; once we can distribute the fuzzy_simplicial_set computation on gpu, we should be able to distribute the remaining pieces, sans t-sne.