MarioniLab / FurtherMNN2018

Code for further development of the mutual nearest neighbours batch correction method, as implemented in the batchelor package.
22 stars 6 forks source link

Add check to avoid bad behaviour with no batch effect #9

Closed LTLA closed 5 years ago

LTLA commented 5 years ago

Consider the following scenario:

set.seed(1000)
centers <- cbind(0, diag(5))
type <- rep(seq_len(ncol(centers)), 50)
expanded <- centers[,type]

# No batch effect between the two batches.
batch1 <- expanded + rnorm(length(expanded), sd=0.1)
batch2 <- expanded + rnorm(length(expanded), sd=0.1)

With correction, we start losing a lot of variance:

# Don't try to cosine-normalize the zero cluster!
corr <- fastMNN(batch1, batch2, cos.norm=FALSE) 
metadata(corr)$lost.var
## [1] 0.2242840 0.2262515

This is due to the orthogonalization step. In the absence of a strong batch effect, the computed batch vector will just point in any random direction. Subsequent removal of variation can potentially remove biological variation, as observed. We should probably add a check to avoid any orthogonalization if the batch vector is small.