Would this also be applicable to bulk data?

Thapeachydude commented 9 months ago

Hi,

great package. I was wondering if this form of batch integration is also applicable to bulk RNA-Seq data. Sure the data is less sparse, but would that be an issue?

Happy about any feedback! Best, M

LTLA commented 9 months ago

Seems pretty reasonable to me. The sparsity wouldn't even matter in fastMNN, which typically operates on the PC space anyway. The only thing to keep in mind is that bulk datasets generally have fewer samples, so the default choices of k (the number of neighbors used to find MNNs) may not be appropriate.

I suppose the other reason that we don't use this class of batch correction methods for bulk data is that the output is not fit for DE analyses. (In fact, you could say that about any batch correction method.) So it's fine and all for exploratory analysis, a bit of clustering, visualization, etc. but if you plan on doing some DE, you'd want to get the raw counts.

Thapeachydude commented 9 months ago

Hi thanks a lot for the quick reply and the feedback!

LTLA / batchelor

Would this also be applicable to bulk data? #48