Nanostring-Biostats / SpatialDecon

The SpatialDecon library implements the SpatialDecon algorithm for mixed cell deconvolution in spatial gene expression datasets. (This algorithm also works in bulk expression profiling data.)
MIT License
33 stars 8 forks source link

Which count matrix to use for creating custom cell profile matrix? #32

Closed aelhossiny closed 2 years ago

aelhossiny commented 2 years ago

Hi, Thank you for such a wonderful tool. I am trying to create custom cell profile matrix from paired SC samples, yet I am quite confused which count matrix should I use. The options are

Using "raw" counts while setting the argument "normalize = TRUE" Using RNA@data which are the normalized data Using "integrated" count matrix, which is the batch corrected integrated count matrix

All of them give me different results, and for biological interpretation, they all seem valid (i.e. despite there is enrichment of another cell cluster, this cluster belongs to the same cell type)

Another question, which normalized GeoMx count matrix should I be using? I tried using the log_normalized count matrix and it doesn't work well as using the Q3_normalized count matrix that hasn't been log normalized yet.

I would really appreciate your help! Thank you!

maddygriz commented 2 years ago

Hi @aelhossiny,

We have not tested the best normalization method for the single-cell data in profile matrix generation, but I think it is safe to say that whatever method is best for the standalone single-cell data is best for creating a profile matrix. In your case, I would assume it is the batch corrected integrated count matrix.

As for the normalization method on the GeoMx side, SpatialDecon expects linear-scale cell profile matrices so don't use log normalization.

Maddy

aelhossiny commented 2 years ago

just a follow up, I noted from the paper that described the algorithm that when optimizing

β.1 = argminβ.i||log(Y.i)−log(B.i+Xβ.i)|| Xp∙K is defined as the cell profile matrix giving the linear-scale expression of p genes over K cell types.

But the scRNASeq data is log normalized already? Does that mean I should use raw counts?

maddygriz commented 2 years ago

If you log normalized your scRNAseq data, then yes you should use the raw counts and normalize in SpatialDecon using the argument "normalize = TRUE". But if you normalized the scRNAseq data not using log normalization I would use that.