Open ImNotaGit opened 2 years ago
While you are welcome to submit a PR, it's not going to be a priority for me to extend summary in this way.
summary
, so don't want to invest time in extending it. At some point, I envision rewriting it to resemble the DESeq2::results
method, which puts everything in a "wider" format, maybe using nested tibbles to handle the discrete/continous components.Workaround is to just call lrTest with your coefficients and combine by hand.
@amcdavid Thanks for the quick reply!
I do realize that this can be complicated as I further explored the manual combine approach. I would like to ask a little more in this line -- just want to make sure I get things correct when I implement my own workaround. I noticed that logFC
in MAST
is kind of unusual in that it requires two contrasts and will return the comparison of contrast1
to contrast0
, while for lrTest
I would pass something that corresponds to contrast1-contrast0
. Generally in a linear model, such as in limma
, we can also obtain a desired logFC with just a single contrast. I believe for MAST
it's different due to the fact that logFC is defined as u(contrast1)v(contrast1) - u(contrast0)v(contrast0)
, however, it seems that this formula doesn't make sense for cases e.g. the desired logFC is a mean of multiple logFC's from a series of replicated pairwise comparisons. I wonder whether an alternative definition would make more sense for these cases, e.g. for intercept-free models where raw coefficients correspond to group means, u(coefficients)v(coefficients) %*% contrast
for a single contrast
(which corresponds to what I would pass to lrTest
). Sorry I have not provided a concrete example, hopefully the point is clear. Could you kindly comment further wrt this?
Thanks again for providing and wonderfully maintaining this useful package. If I will ever manage to get a confident understanding of the inner workings of the package I'll be glad to try a PR:)
However, it seems that this formula doesn't make sense for cases e.g. the desired logFC is a mean of multiple logFC's from a series of replicated pairwise comparisons.
I probably need an example here to understand what you are referring to.
I wonder whether an alternative definition would make more sense for these cases, e.g. for intercept-free models where raw coefficients correspond to group means, u(coefficients)v(coefficients) %*% contrast for a single contrast (which corresponds to what I would pass to lrTest).
What do you mean by coefficients
vs contrast
? You can select a coefficient with a contrast that is just an indicator vector.
I'll try to explain with a toy example. For a "real" single-cell RNA-seq dataset there will be of course a lot more samples (cells).
First consider a simple one-way design of a factor named Group
with the two levels "Control"
and "Treated"
, the aim is to compare the "Treated"
group to the "Control"
group.
#> Group
#> 1 Control
#> 2 Control
#> 3 Control
#> 4 Treated
#> 5 Treated
#> 6 Treated
model.matrix
with formula ~Group
would give:
#> (Intercept) GroupTreated
#> 1 1 0
#> 2 1 0
#> 3 1 0
#> 4 1 1
#> 5 1 1
#> 6 1 1
And in MAST::logFC
contrast0
would be the following, corresponding to the "Control"
group:
#> (Intercept) GroupTreated
#> (Intercept) 1 0
contrast1
would be the following, corresponding to the "Treated"
group:
#> (Intercept) GroupTreated
#> GroupTreated 1 1
u(contrast0)v(contrast0)
would be interpreted as the expected expression level (on the log scale) of the "Control"
group, similarly u(contrast1)v(contrast1)
would be that for the "Treated"
group, thus u(contrast1)v(contrast1)-u(contrast0)v(contrast0)
would correspond to the logFC between "Treated"
and "Control"
. For this case everything is fine.
But then consider, a repeated measure design involving paired "Baseline"
(i.e. control) and "Treated"
groups of the same individual (i.e. paired data for each individual). The aim is to compare the "Treated"
group to the "Baseline"
group across the individuals. One may be interested in a effect size measure something like ( (Treated1 - Baseline1) + (Treated2 - Baseline2) + (Treated3 - Baseline3) )/3
, in general this would be a (possibly weighted) average of logFC values for paired comparisons.
#> Individual Group
#> 1 1 Baseline
#> 2 1 Treated
#> 3 2 Baseline
#> 4 2 Treated
#> 5 3 Baseline
#> 6 3 Treated
The issue with this case is that for MAST::logFC
we have to specify two contrasts corresponding to two groups to be compared. In the above case one could specify contrast1
that corresponds to (T1 + T2 + T3)/3
(T
for Treated
), and contrast0
that corresponds to (B1 + B2 + B3)/3
(B
for Baseline
). Normally this should not be an issue, as (T1 + T2 + T3)/3 - (B1 + B2 + B3)/3
would be equivalent to what we want. But in MAST
, there are the continuous and discrete components that are being multiplied, and the results should be not equivalent in general. In other words, u((T1+T2+T3)/3)v((T1+T2+T3)/3) - u((B1+B2+B3)/3)v((B1+B2+B3)/3)
is not the same as ( (u(T1)v(T1) - u(B1)v(B1)) + (u(T2)v(T2) - u(B2)v(B2)) + (u(T3)v(T3) - u(B3)v(B3)) )/3
. Naively, it appears to me that the latter has a more meaningful interpretation considering the meanings of u(.) and v(.) and u(.)v(.)
Given this context, when I said u(coefficients)v(coefficients) %*% contrast
, I actually meant ( coefC * 1/(1+exp(-coefD)) ) %*% contrast
(sorry my previous notation was wrong). coefC and coefD would be the coefficients obtained when using an intercept-free model, such that they correspond to the estimated value for each "actually-defined" group (T1, T2, T3, B1, B2, B3), and contrast
in the above example would be something like
T1 T2 T3 B1 B2 B3
1/3 1/3 1/3 -1/3 -1/3 -1/3
Hopefully I am not missing anything and this makes sense. Thanks again for your time.
In other words, u((T1+T2+T3)/3)v((T1+T2+T3)/3) - u((B1+B2+B3)/3)v((B1+B2+B3)/3) is not the same as ( (u(T1)v(T1) - u(B1)v(B1)) + (u(T2)v(T2) - u(B2)v(B2)) + (u(T3)v(T3) - u(B3)v(B3)) )/3
Yes -- that is for better or worse intrinsic to the model. The former quantity (log fold change of the average effect) is going to be different than the average of the log fold change. A workaround might be to stratify into 3 groups, calculate the log-fold change in each group, then average those -- you could propagate error by just summing the variances and dividing by the number of summands squared (eg 9 in your example) because the estimates will be independent (in the statistical sense) from each other.
From a "multifactorial ANOVA" standpoint, if you are fitting an interaction model (between individual and treatment), does it make sense to interpret an average effect? Some people say not.
@amcdavid Thanks for the suggestions. I can see that it's tricky to accommodate for a general case in the computation of logFC. I guess this issue can now be closed.
From a "multifactorial ANOVA" standpoint, if you are fitting an interaction model (between individual and treatment), does it make sense to interpret an average effect? Some people say not.
Yep sure I agree that if the interaction effect is non-trivial then averaging the main effect is probably not meaningful or informative. This is an independent data analysis consideration though, but points well taken and thanks for mentioning this.
This is a (hopefully) minor feature request.
I am using the development version of the package.
lrTest
supports passing numerous different types of objects for the argumenthypothesis
, according to the document: "hypothesis
can be one of a character giving complete factors or terms to be dropped from the model,CoefficientHypothesis
giving names of coefficients to be dropped,Hypothesis
giving contrasts using the symbolically, or a contrast matrix." However, when called viasummary(zlmfit, doLRT)
, thedoLRT
argument can only be a character vector of coefficients to drop in the reduced model. It would be very helpful to also support the multiple argument types fordoLRT
insummary
as those forhypothesis
inlrTest
.My use case is as follows: I was exploring a series of somewhat complex nested design models, and I was not confident that I got the design matrix correct every time. After considering the experimental designs and questions of interests, I decided to instead create a new factor for combinations of all levels of the factors under consideration, provide an intercept-free model including only this new factor to
zlm
, and then obtain log fold-change values for a customized comparison viagetLogFC(zlmfit, contrast0, contrast1)
specifying both thecontrast0
andcontrast1
arguments. I was able to include such customized logFC insummary
viasummary(zlmfit, logFC=getLogFC(zlmfit, contrast0, contrast1))
, and I also wanted to perform LR test to get P values exactly corresponding to the customized comparison. But with the currentdoLRT
argument only accepting a character vector of coefficients I was not able to specify the test I needed.Thanks for your help.