YMa-lab / CARD

GNU General Public License v3.0
80 stars 20 forks source link

Can I use normalized expression data as sc_count #44

Closed kangjiajinlong closed 1 year ago

kangjiajinlong commented 1 year ago

Hi CARD team,

I went through the CARD tutorial and noticed that the example sc_count input has the format of raw count (i.e. integer number). I wonder if I can use normalized expression data for sc_count (for example, the typical log-transformed data provided in Seurat @data slot). I have on hands a few very good reference scRNA-seq datasets but they only provide Seurat @data level log-transformed data. I tried to use the log-transformed data as sc_count and the CARD pipeline did run through. But I am not sure whether this negatively impaces the deconvolution quality.

Could you kindly clarify if integer raw count is absolutely required?

Thanks, Jack

YingMa0107 commented 1 year ago

Hi @kangjiajinlong,

Thank you for your interest in CARD!

For your question, I understand that some of the Seurat object only provided normalized data. Ideally, the data for the single cell data and the spatial data should be count data since we construct the reference basis matrix utilizing the relationship between mean gene expression (reference basis matrix) and the mixed gene expression (spatial data). In reality, I have tried before to use the normalized data, the performance look good to me, but I did not try a lot of datasets. If there is no available public scRNAseq data, when you want to use the normalized data, pay attention that the scRNAseq data and the spatial data should always be non-negative since CARD is a NMF based model!

Another option for CARD is to use the reference free version of CARD, which is CARD_free. CARD-free only requires the input of marker gene list and spatial transcriptomics count data. But CARD_free does not come with a cell type label, so post-annotation analysis might be needed to further annotate the cell type.

Hope this helps!

Best, Ying