Open LuckyMD opened 6 years ago
I don't understand your design from your description. Can you clarify, please? Perhaps post your design matrix?
Conceptually I set up my design matrix using the variables:
Condition Sample Chip ngeneson TRUE. A C1. 1030 FALSE. B C1 2025 FALSE. B. C2. 2583 TRUE. A. C2. 2353 TRUE. C. C3. 1843 FALSE. D. C3. 1283 ......
I assume this would have been transformed into 0s and 1s automatically and an offset would have been added. What I mean to say is that in this toy example Condition==TRUE
includes samples A
and C
and Condition==FALSE
includes samples B
and D
. Thus, if I include both condition
and sample
in the design matrix this should not be full rank as condition
can be inferred from sample
.
However, I did run this setup... and I recovered differentially expressed genes over the condition
covariate. This puzzles me a little, as I would assume the variance in condition
would have already been accounted for in the sample
covariate.
Now, when I run without the sample
covariate, I recover very different DEGs over condition
.
Is there any way to account for sample
variance within condition
in the model?
Maybe use a random effect model with method='glmer'
but with only three samples and three chips you're not going to get great estimates of the variability.
Something like: ~Condition+ngeneson+(1|Sample)+(1|Chip)
.
And center your ngeneson variable.
@amcdavid any thoughts?
Also, as a warning there's issue #98 that's related to estimation with random effects that we need to look into still.
I just used three in the example above. I actually have 8 chips and 8 samples split over 2 conditions.
Does centering the ngeneson variable make a big difference? That would be something I could alter.
I'm not very familiar with mixed effect models. What exactly would I be modeling using (1|Sample) + (1|Chip)
?
Do you have any idea why I would get an output at all with the above design matrix (including sample
and condition
)? How can MAST fit a model where it could assign the variance to either the sample
or condition
covariate?
ngeneson
is 0, which is meaningless and outside any data you actually observe. So centering ngeneson gives the interpretation of coefficients as the expected value when ngeneson
is set to the mean. Hi again!
I recently tried your suggested solution of using the following zlm
command:
zlmCond_astro <- zlm(formula = ~condition + (1|sample) + (1|chip) + ngeneson, method='glmer', ebayes=FALSE, sca=sca_astro_filt, parallel=FALSE)
However, I am running into an error regarding a fixef()
function. Could you decipher the following error message for me?
Error in fixef(object@fitC) : could not find function "fixef"
Maybe load the lme4 package first?
Greg Finak
On Wed, Dec 5, 2018, 09:52 MalteDLuecken <notifications@github.com wrote:
Hi again! I recently tried your suggested solution of using the following zlm command: zlmCond_astro <- zlm(formula = ~condition + (1|sample) + (1|chip) + ngeneson, method='glmer', ebayes=FALSE, sca=sca_astro_filt, parallel=FALSE)
However, I am running into an error regarding a fixef() function. Could you decipher the following error message for me?
Error in fixef(object@fitC) : could not find function "fixef"
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/RGLab/MAST/issues/97#issuecomment-444579297, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUkeQmB4GTJ4nEpX6cQSeR68YQW_NCjks5u2AfAgaJpZM4VqjPH .
Ah, thanks. I wasn't aware of other dependencies. I assumed I had accidentally renamed a fixed()
function and was looking for my error ;). Mea culpa.
I think we have good reasons to have lme4 be only Suggested
. Can we import its methods anyways? Putting a call to if(require(lme4))
seems like bad behavior since it modifies the search path (and I think is worthy of a warning from R CMD check
Hi all, I post here since the topic is the same and this issue is still open, but I can open a new issue if necessary. I would like to compare two conditions with nested batches, but with a simpler design matrix (I have no chip covariate). The composition of each condition is similar to:
condition batch
Case Sample1
Case Sample2
Case Sample3
Case Sample4
Control Sample5
Control Sample6
Control Sample7
Control Sample8
In such case, with no chip covariate, does it make sense to also model the batch or would it lead to losing the condition signal, i.e.:
~ condition + ngeneson + batch
? At the moment I'm using MAST without the batch covariate
~ condition + ngeneson
since I thought introducing the batch covariate would lead to the wrong results in this case. Is this justified, or I should be able to use it instead?
Thanks
I assume you have multiple cells per sample? If so I think you could include a random batch effect.
Thanks @gfinak, yes I have multiple cells per sample. I'll try with the batch covariate. I would like to learn more about why the batch covariate doesn't remove the condition. In a previous message, you indicate that "Sample and condition are not entirely aliased (you have more samples than conditions)." Do you have at hand some documentation I can read to understand better how the MAST's model works in this case? Thanks!
I tested using the batch covariate. I noticed that, although some genes related to the condition (positive controls) are recovered, most of the others are not. And the (log-fold-change, FDR) of the recovered genes is smaller (LFC) and less significant (FDR) with respect to the analysis without the batch covariate.
Is it correct to state that in this case, due to conditions partially confounded with batch (4 unique batches per condition), the statistical power of the test is reduced when adding the batch covariate? Keeping however in mind that adding the batch covariate is the more appropriate approach... Thanks.
Hi @fbrundu, I had the same problem when using MAST with the batch covariate. Has the problem with batch covariate been solved, somehow?
Hi!
I was wondering if you might be able to help me with an issue I've been having. I am trying to run MAST with a nested batch effect.
I have data from an experiment with 6 mice in 2 conditions (3 mice each), spread over 6 plates (condition & control spread half-half per plate). Now I'd like to run MAST on normalized, but not batch-corrected data and include the batch as a covariate.
First I ran
~condition + sample + chip + ngeneson
, wheresample
is just the mouse. I got expected genes out of this, although down-regulated when I expected them to be up-regulated. Then, I noticed thatsample
is just a higher resolution ofcondition
(each mouse is either condition or control). So that would lead to a non-full-rank design matrix and should have output wrong results I assume.Running just
~condition + chip + ngeneson
however gives me far fewer DE genes, and not the ones I would expect. I assume this has to do with the variability between mice making the background noise quite high.I was wondering if I can add the mouse covariate so that it is fit separately per condition to account for variation between mice without interfering with the condition covariate.
Thanks for any help you can provide!