LTLA / scuttle

Clone of the Bioconductor repository for the scuttle package.
https://bioconductor.org/packages/devel/bioc/html/scuttle.html
9 stars 7 forks source link

logNormCounts - log transformation with different log base? #12

Open JPGranizo opened 3 years ago

JPGranizo commented 3 years ago

Hi,

I am using scran / scuttle for scRNAseq normalization.

clusters <- scran::quickCluster(sce) sce <- scran::computeSumFactors(sce, cluster = clusters) sce <- scuttle::logNormCounts(sce)

As far as I could find out, this code will log2 transform data. I was wondering if there is any option to specify log transformation to another log base, particularly natural log?

Thank you very much in advance, Kind regards, JPG

PeteHaitch commented 3 years ago

Remember that you can perform a change of base for logarithms.

counts <- rpois(10, 4)
log_norm_counts <- log2(counts + 1)
log_norm_counts / log2(exp(1)) # Change of base
#>  [1] 0.6931472 1.0986123 1.0986123 1.3862944 1.3862944 1.6094379 1.3862944
#>  [8] 1.0986123 1.7917595 0.6931472
log(counts + 1) # Same result as above
#>  [1] 0.6931472 1.0986123 1.0986123 1.3862944 1.3862944 1.6094379 1.3862944
#>  [8] 1.0986123 1.7917595 0.6931472

That is, you can take the logcounts returned as an assay by logNormCounts() and switch them to whatever log base you want using the change of base trick.

ATpoint commented 3 years ago

Or just log=FALSE and then do any transformation you want. Can someone confirm that my answer here is correct? https://www.biostars.org/p/9477960/#9477964

LTLA commented 3 years ago

Internally, if pseudo.count=1, the logNormCounts() function will use log1p. This has the advantage of preserving sparsity compared to the naive log(x + 1) method, which goes through a non-sparse intermediate x + 1. For all other settings of pseudo.count, we go through the usual log(x + pseudo.count), because we're going to lose sparsity anyway.

Now, log(x + 1) is a natural log, so we divide the result by log(2) to obtain log2-transformed values. One could just as easily undo that division outside of the function, or change the base as desired. It's just a division so it's pretty cheap.

@ATpoint your answer is mostly correct, though the deconvolution size factors are only used if you preloaded them into the SCE with computeSumFactors(). (Or whatever it was renamed to, can't remember.) If there are no size factors in the SCE, it will just compute library size-derived factors via librarySizeFactors(), which is cheaper and probably good enough in most applications.