What would be the best strategy to integrate multiple datasets that only have limited overlapping population

mzhibo commented 4 years ago

Hi, I am interested in integrating multiple datasets that do not have a share common population but may share some in pairs. For example, I have 4 datasets. 1st have A, B, C, D types of cells. 2nd have A'. B', C', and D types of cells. A' means it might changed from A. 3rd may only have A cell, 4th sample may have C', D', E, F cell type. is this kind of integration suitable for your tools? Should the merge be done step by step in pairs or in once? what if I want to combine datasets collected from different developmental stages? One of the fetal stage is pretty early that it is transcriptomically distinct from the rest. Would this be still suitable to be integrated in scMerge. Thanks for your input. Best, Zhibo

YingxinLin commented 4 years ago

Hi Zhibo,

Thank you for your interest in scMerge. scMerge allows merging the multiple datasets in one go. For the datasets that are from known different developmental time point and if you would like to retain such information during integration, I would recommend you to perform semi-supervised to set the stages as wanted variation (parameter WV in scMerge()). Also, if you know some known markers of the cell types with variation during the development process, you could also input these markers in WV_marker (this is optional).

Please find the examples of performing semi-supervised scMerge in our AMSI BioinfoSummer 2019 material here: https://sydneybiox.github.io/BIS2019_SC/scMerge.html#61_semi-supervised_scmerge.

Best wishes, Yingxin

mzhibo commented 3 years ago

Thank you, Yingxin!

SydneyBioX / scMerge

What would be the best strategy to integrate multiple datasets that only have limited overlapping population #25