MarioniLab / scran

Clone of the Bioconductor repository for the scran package.
https://bioconductor.org/packages/devel/bioc/html/scran.html
40 stars 22 forks source link

NaNs in mnnCorrect output matrix #38

Closed danshu closed 5 years ago

danshu commented 5 years ago

Hi, I found that in corrected batch2 data, most of values are NaN. Here are my codes for running mnnCorrect. In logcounts(sce1) and logcounts(sce2), there are no NAs or infinite values.

fit1 <- trendVar(sce1, parametric=TRUE,use.spikes=FALSE) dec1 <- decomposeVar(sce1, fit1) fit2 <- trendVar(sce2, parametric=TRUE,use.spikes=FALSE) dec2 <- decomposeVar(sce2, fit2) universe <- intersect(rownames(dec1), rownames(dec2)) mean.bio <- (dec1[universe,"bio"] + dec2[universe,"bio"])/2 chosen <- universe[mean.bio > 0] original <- list(batch1=logcounts(sce1)[chosen,],batch2=logcounts(sce2)[chosen,]) mnn.out <- do.call(mnnCorrect, c(original, list(k=20,cos.norm.in=FALSE,cos.norm.out=FALSE))) rmat <- do.call(cbind, list(counts1=counts(sce1)[chosen,],counts2=counts(sce2)[chosen,])) mnndat <- do.call(cbind, list(mnn1=mnn.out$corrected[[1]],mnn2=mnn.out$corrected[[2]])) sce <- SingleCellExperiment(list(counts=rmat,logcounts=mnndat))

Best, danshu

danshu commented 5 years ago

I noticed that there are less expressed genes in Batch 2 than Batch 1. sum(nexprs(sce1, byrow=TRUE) > 0)=27746 sum(nexprs(sce2, byrow=TRUE)>0)=21184 sum(nexprs(sce1, byrow=TRUE)>3)=22003 sum(nexprs(sce2, byrow=TRUE)>3)=16851

Is this this causing the problem? Should I only use variable genes expressed in both batches for mnnCorrect?

LTLA commented 5 years ago

Hmm, everything looks fine to me. The only cause for NAs would be if you were unfortunate enough to have a cell that was all-zero in the chosen subset; this would result in a L2 norm of zero and NAs after cosine normalization. But this doesn't seem to be the case, as you've turned off cosine norm.

I am a bit bemused, but perhaps you could see if the problem persists in the batchelor version of this function? The scran version will be deprecated in the next release as all batch-correlation-related functions are moving over to batchelor. (Otherwise scran gets too bloated.)

LTLA commented 5 years ago

This may have been fixed by the patch described in MarioniLab/FurtherMNN2018#11.

In any case, I am closing this unless further information is provided.