YMa-lab / CARD

GNU General Public License v3.0
94 stars 21 forks source link

Question about inputs: raw vs. normalized counts #14

Closed kguion1 closed 2 years ago

kguion1 commented 2 years ago

Hi, thanks for creating a great tool! I am trying to deconvolute some visium data using publicly available single cell data (eventually we will have single nucleus). I noticed that the tutorial specifies raw data for both the spatial and single cell datasets. I was wondering if the algorithm would be affected by using normalized counts from SCTransform and log2CPM gene expression values for the public single cell data? Right now, I only have access to normalized counts for single cell.

Do you have any suggestions on which data to use?

Any help is appreciated!

YingMa0107 commented 2 years ago

Hi @kguion1,

Thanks for your interest in our package!

Ideally, the data for the single cell data and the spatial data should be count data since we construct the reference basis matrix utilizing the relation ship between mean gene expression (reference basis matrix) and the mixed gene expression (spatial data). In reality, I have tried before to use the normalized data, the performance look good to me, but I did not try a lot. If there is no available public scRNAseq data, when you want to use the normalized data, pay attention that the scRNAseq data and the spatial data should always be non-negative since CARD is a NMF based model!

Hope this help!