NVIDIA-Genomics-Research / rapids-single-cell-examples

Examples of single-cell genomic analysis accelerated with RAPIDS
Apache License 2.0
318 stars 68 forks source link

added new features and functionalities #74

Closed Intron7 closed 2 years ago

Intron7 commented 3 years ago

Hello clara-parabricks team,

I created a cunnData class for the preprocessing of the data that feels a lot like the scanpy / anndata implementation, but naturally its a lot less refined. However it keeps the meta data in obs and var and also allow for better qc control (e.g it allows to cut off cell with to expression of mitochondrial genes). The class is in the cunndata.py file.

I also updated the rapids_scanpy_function file to rapids_scanpy_function_v2.py. The regress out function now uses the meta data from .obs in the same way that sc.pp.regress_out works. You pass a list of .obs_names as argument and it will regress those out. I also wrapped leiden pca kmeans and tsne so that it works with anndata objects and updates/ creates the result entires in adata on its own. PCA now also proviedes the variance data in .uns so that we can use sc.pl.pca_variance_ratio. A lot of people use scatter and violin plot to check the quality of the data. I created 2 basic plotting functions that work with the cunndata class. I created a new notebook (hlca_lung_gpu_analysis_v2.ipynb) based on the hlca lung data one. This shows off all the changes I made and new features I implemented. All these function should work at a similar speed as your original implementation. I didn't see a speed differences between my implementation and the original, that wasn't within the run to run variance. Please run hlca_lung_gpu_analysis_v2.ipynb to check if my assumption of equal speed is correct. I ran my tests on a Quadro RTX 6000.

In my opinion these changes enhance the usability of running single cell analysis on the GPU. I hope this might be a somewhat adequate stopgap measure until the Theis lab releases scanpy with full gpu support.

Yours Severin

rilango commented 3 years ago

@Intron7 we had in the past tried to add GPU support in AnnData. We think you may be interested in this. Please find our WIP at https://github.com/rilango/anndata/tree/rilango/initial-with-test.

Here, our intention was to add GPU support into existing libraries.

Please review some of the plotting features in the interactive tool in the notebook https://github.com/clara-parabricks/rapids-single-cell-examples/blob/master/notebooks/hlca_lung_gpu_analysis-visualization.ipynb. It has Violin plots.

Intron7 commented 3 years ago

@rilango I get that your idea was to create a proof of concept with this repo. I really love the speedup these functions gave my day to day work. With this PR my idea was to take your initial design and streamline it into something that I can use for my own analysis on a day to day basis without much trouble and a familiar architecture. I know that your visualization library has lots of plotting functions for the post processing. My plotting functions are for pre-processing of the data. These scatter and violin plots are mainly used to find suitable cutoffs and look for unwanted noise like mt-counts within the data. I'll take a closer look at your implementation of anndata and look if I can contribute.