SydneyBioX / scMerge

Statistical approach for removing unwanted variation from multiple single-cell datasets
https://sydneybiox.github.io/scMerge/
66 stars 13 forks source link

Intergration with DE analysis #15

Closed crotoc closed 5 years ago

crotoc commented 5 years ago

Thanks for this great package that I can combined data sets from completely different sources. I have a question that after getting batch effect corrected, how can I do the DE analysis. Or DE analysis after batch effect correction is fundamentally not suitable. Please give me some advice! Thanks!

crotoc commented 5 years ago

Many DE analysis method requires count data and I figured many negative numbers existed in the corrected results. It seems like this data doesn't fit the required input format of DE analysis.

crotoc commented 5 years ago

If I am correct the results after correction are still in log scale and can be used in limma to do the DE analysis? Please let me know if I am wrong. Thanks!

YingxinLin commented 5 years ago

Hi,

Thank you for your interest of scMerge! The output of scMerge is practically a "log-transformed" distributed interpretation of data, with a very small percentage of negative values because of a scaling step within the algorithm. You could substitute the negative values with zeros before performing DE. DE methods for log-scale data, such as limma, are suitable for the scMerge output.

Yingxin

crotoc commented 5 years ago

Thanks very much! Will try it right now! I have another question when I am using prenatal brain scRNAseq and adult brain scRNAseq to conduct the analysis. These two data sets are completely from two labs, and are totally different. After reading your paper, I think that scMerge will identify psuedo replicates between these two data sets. I think there should be no excact same cell types between these to data set, but maybe some of them may be similar. In this case, does scMerge can deal it right? When running scMerge, there is a window with a plit coming up and show that there two pairs between the two data sets. Does it mean scMerge makes sense on my application? Thanks!

YingxinLin commented 5 years ago

Hi,

The network plot with two pairs connected indicates scMerge has identified two pairs of mutual nearest clusters as pseudo-replicates. The current scMerge algorithm is base on the assumption that these two pairs of clusters will share some similar biology signals.

Yingxin

crotoc commented 5 years ago

Very great to know that! I have the labels for every clusters, is that possible to know which clusters are pairs?

YingxinLin commented 5 years ago

Hi,

I am wondering if you have input cell_type information when performing scMerge (that is, perform semi-supervised scMerge II, https://sydneybiox.github.io/scMerge/articles/scMerge.html#semi-supervised-scmerge-ii) or the default setting.

Currently, we do not have a very convenient way to check this output, but we will implement it as one of the output soon.

A way that might be useful to check this output for now is to check the replicate matrix that is stored in metadata after performing scMerge

scRep <- apply(sce@metadata$scRep_res, 1, 
               function(x) colnames(sce@metadata$scRep_res)[which.max(x)])

table(scRep[grep("Replicate", scRep)], 
      sce$cellTypes[grep("Replicate", scRep)])

The cell types that are in the same replicates are corresponding to the pair of mutual nearest cluster. Please let me know if these codes work for you!

Cheers, Yingxin