Very slow in R seurat implement

yuanlizhanshi commented 8 months ago

Recently, I have tried this method to infer gene co-expression in single-cell RNA-seq data, but I found it was too slow in R, could it be possible to implement it in Python with the input of anndata.

Thank you very much

ChangSuBiostats commented 8 months ago

Hi Yuanlizhanshi,

Thanks for your interest. This repo you are commenting on actually provides an implementation of CS-CORE in Python. We will provide a tutorial that takes the input of anndata in the next couple of days. Meanwhile, feel free to install it with

pip install git+https://github.com/ChangSuBiostats/CS-CORE_python.git

and try CSCORE_IRLS.py, where X is a cell by gene count matrix, seq depth is the sequencing depth of cells. We recommend subsetting the full cell by gene matrix X from anndata to focus on only the genes of interest (for example, the top highly expressed genes), as computation involving the full matrix with 20k genes will be slower, and co-expressions are notoriously hard to estimate for extremely sparse genes.

A tutorial / vignette of this Python version is coming soon! Will update here when is available.

Zethson commented 7 months ago

@ChangSuBiostats is there an update on the vignette?

ChangSuBiostats commented 7 months ago

Hi @yuanlizhanshi and @Zethson ,

Thank you both for your interest in our work!

Here is a vignette for running CS-CORE with AnnData input in Python: notebook. We have also benchmarked the time and memory usage of this implementation at this notebook.

Feel free to leave a message if you have any questions!

Best, Chang

ChangSuBiostats / CS-CORE_python

Very slow in R seurat implement #1