chanwkimlab / MarcoPolo

MarcoPolo is a clustering-free approach to the exploration of bimodally expressed genes along with group information in single-cell RNA-seq data
https://chanwkimlab.github.io/MarcoPolo/HumanLiver/index.html
Other
19 stars 2 forks source link

Resolve sample batch #2

Open millersan opened 2 years ago

millersan commented 2 years ago

Dear authors, Please accept my sincere thanks for providing such a useful tool. How to solve the sample batch of input counts, and can I use the normalized data for calculation? Best, Miller

chanwkimlab commented 2 years ago

Hi Miller,

Thank you so much for using our software. As MarcoPolo internally uses Poisson distribution, it cannot take normalized data as input. Instead, MarcoPolo can handle the batches of samples by directly modeling them as covariates - denoted as ß in the paper.

To use this feature, you can put the covariate matrix of the batches to the Covar parameter of the save_QQscore function: https://github.com/chanwkimlab/MarcoPolo/blob/master/MarcoPolo/QQscore.py#L113. As you can see in the code, when the Covar parameter is not set, it only models intercepts, which means that the same baseline expression is assumed for all cells. For batches of samples, you can set the Covar parameter to a matrix where the batch information is shown in one-hot so that different baseline expressions are used per each group of cells.

Please let me know if there are any other issues.

Best, Chanwoo