Closed RunyuXia closed 1 month ago
Hi @RunyuXia -- I think in theory it would work to set the number of variable genes equal to the number of genes in your dataset. The data is normalized in the process of batch correction, so this won't work if you want to preserve the data in units of counts as opposed to variance normalized counts. Overall I don't think this is recommended though. If there are low variance genes and you variance normalize them, they will contribute the same amount of signal to the principal components as the high variance genes so it will swamp out the signal in the data. If you are looking for a general batch correction approach that operates directly on counts without normalization, perhaps mutual nearest neighbors or COMBAT is what you are looking for.
Hi!
I've noticed that the output from the preprocessing step of the updated batch correction method only yields a counts h5ad file containing the highly variable genes. For my NMF analysis, I want to include as many genes as possible. Could you provide guidance on if it would be possible to retain batch-corrected counts for all genes?
Thanks!