GabrielHoffman / dreamlet

Perform differential expression analysis on multi-sample single cell datasets using linear mixed models
https://gabrielhoffman.github.io/dreamlet
20 stars 4 forks source link

Error when using fitVarPart() with a mixed model #4

Closed aidarripoll closed 1 year ago

aidarripoll commented 1 year ago

Dear all,

I understand the dreamlet::fitVarPart() is internally using the variancePartition package, which uses linear and linear mixed models to quantify the contribution of multiple sources of expression variation at the gene-level.

When calling dreamlet::fitVarPart() using a mixed-model formula like this ~Sex + Age + (1|Batch), where Sex and Batch are categorical and Age is continuous, I encountered the following error:

_"Error in run_model_check_mixed(fit, showWarnings, dream, colinearityCutoff, : Categorical variables modeled as fixed effect: Gender Must model either all or no categorical variables as random effects here"_

From this, I understand the function can only handle mixed formulas with all categorical variables treated as fixed or random factors. Likewise, using the following formulas no longer gives an error: ~Sex+Age+Batch or ~(1|Sex)+Age+(1|Batch).

Here you can find the cross-table showing the number of cells per Batch (columns, which are the sequencing dates) and Sex (rows):

    180925 180926 181003 181022 181023 181107 181108 181213 181218
  M    454    660    642    196    751    408    245    688    512
  F    640    566    533    624    725    705    694    345    612

In the differential expression analysis, we're treating Batch as a random effect and Sex as a fixed effect, but we're only interested in the Sex or Age coefficients/effects, but we still want to control for Batch. Hence, I'm wondering which option would be better (or correct) for the variance partition analysis, in order to be the most similar to the differential expression analysis set-up:

  1. ~Sex+Age+Batch --> all categorical variables as fixed factors
  2. ~(1|Sex)+Age+(1|Batch) --> all categorical variables as random factors
  3. ~Sex+Age --> in this case, the Batch contribution would be part of the residuals

Thanks a lot! Aida

aidarripoll commented 1 year ago

Since we're not interested in the contribution of the Batch variable, and that's why we control for it in the DEA model, I guess what we could do is first regress out the Batch variable effect from the gene expression, and afterward, perform the variance partition analysis on the residuals using the following model ~Sex+Age. However, some of the residuals could be negative, and I'm not sure whether dreamlet::fitVarPart() can handle it..

Thanks again, Aida

GabrielHoffman commented 1 year ago

1) Use ~(1|Sex)+Age+(1|Batch). variancePartition works best when categorical variables are modeled as a random effects. It't not an issue that this formula isn't identical the the differential expression formula.

2) You could do that, but you'd have to use variancePartition::fitExtractVarPartModel() directly.

But using (1) is easier, and that is the workflow I designed dreamlet for

Gabriel