SingleR-inc / celldex

Collection of cell type reference datasets.
https://bioconductor.org/packages/devel/data/experiment/html/celldex.html
44 stars 7 forks source link

what's the method of log-normalized in celldex #16

Closed shangguandong1996 closed 11 months ago

shangguandong1996 commented 2 years ago

Hi, Dear Developer I noticed that according to the celldex manual, the value of celldex is log-norm or something like TPM.

Each dataset contains a log-normalized expression matrix that is intended to be comparable to log-UMI counts from common single-cell protocols (Aran et al. 2019) or gene length-adjusted values from bulk datasets.

But I am wondering whehter you can tell me what's the methods behind the log-norm count. Because I also want to make a similar database but for Arabidopsis thaliana

Best wishes

Guandong Shang

j-andrews7 commented 11 months ago

The expression values vary between datasets. Many of them were simple log2(counts +1) transformations, some were pulled from databases that provided them in a log-transformed scale. You can see the various scripts that were used to download and process the data here.

Regardless, to make your own, a simple log2 transformation of raw counts is likely appropriate.