HelenaLC / muscat

Multi-sample multi-group scRNA-seq analysis tools
158 stars 32 forks source link

Multiple Random Effects #134

Open jgockley62 opened 2 months ago

jgockley62 commented 2 months ago

Hi,

I have a data set of single cell RNA seq run out on 167 individuals, with individuals spread across several batches. I want to run a model across all cells such as ~ (1|Indv_ID) + (1|Batch) + cov_1 + cov_2. From what I understand, I could change the Indvidual ID to column name to sample_id and run:

mmDS(sce,
     covs = c( "cov_1" + "cov_2" ),
     method = "dream2",
     n_threads = 32 )

But this would only specify a mixed linear model of ~ (1|Indv_ID) + (1|Batch) + cov_1 + cov_2 correct? How could I add Batch as random effect, would renaming the column ID of batch to (1|Batch) ? Thanks

plger commented 2 months ago

Hi, No renaming the column won't help, right now it's not possible to specify 2 random effect variables. How are the individuals spread across the batches? If you have large batches containing the different experimental groups, then you should be fine modeling it as a fixed effect variable (I E. Normal covariate). If they're small, there's probably not much added value on top of the individuals effect, except of course of your aim is to understand individual vs batch variability (as opposed to identifying differences between experimental groups). Note also that we did not, in benchmarks, find an advantage of cell-level MM over pseudobulk analysis, and with that many samples I would definitely opt for pseudobulk...

jgockley62 commented 2 months ago

Using as a fixed is possible, albeit not the most optimal. Its not posible to use a mixed model an then pseudobulk the corrected expression is it? ie 1) correct on the cell level: ~ (1|batch) + mt_Percent + logUMI 2) pseudobulk by individual 3) DE Analysis: ~ (1|IndvID) + sex + age + disease

plger commented 2 months ago

Hi, of course you can do that, but you lose the uncertainty related to the effect of the covariates you correct for, which is dangerous. This is getting into an area where we don't have very clear facts on which to base decisions, but I'm pretty confident that this is considerably worse than treating your batches as fixed effects. If you really insist on fitting the model you want to fit (and again I'm not sure you should), you can do it by splitting your clusters and manually running dream on each (instructions for dream are available in this vignette).

jgockley62 commented 2 months ago

I'll poke around the options and see how it pans out, thanks!