HelenaLC / muscat

Multi-sample multi-group scRNA-seq analysis tools
163 stars 33 forks source link

MUSCAT Mixed Effects Models and the Cellular Detection Rate #114

Open ismailelshimy opened 1 year ago

ismailelshimy commented 1 year ago

Hello MUSCAT people,

I hope my message finds you fine. I have an inquiry regarding the mixed-effects models implemented using mmDS function in MUSCAT. As I understood from your paper and the package vignette, these models are defined such that gene expression ~ 1+ group_id +(1 | sample_id) where group_id is modelled as fixed-effect variable and sample_id as a random-effect variable.

I have previously observed that one recommendation by single-cell data analysts is to also model the effect of cellular detection rate CDR (= fraction of genes expressed in each cell) by adding this as variable in the [model].(https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0844-5) For example, MAST was implemented previously as a mixed effect model as follows: gene expression ~ CDR + group_id + (1 | sample_id).

So what do you think about this kind of implementation? Should analysts try to correct for the effect of CDR on gene expression by adding it in the model? And is there an easy way to implement this in mmDS?

Thank you very much.

Ismail

plger commented 1 year ago

Hi,

You can simply pass the name of the colData column (e.g. "CDR") to the covs argument of muscat::mmDS. Note that it's better to use something scaled like the CDR (rather than, say, the number of detected genes), so that it's in the same range as other predictors.

As far as I know we haven't benchmarked the use of such covariates, but in principle I guess it can't hurt (given the large number of cells) and can indeed help when the normalization doesn't do a good job.

ismailelshimy commented 1 year ago

Thank you very much for your kind and reply. I will try that 👍