Closed amcdavid closed 7 years ago
This seems to occur when the original model was rank deficient, so the reduced model has the same column space as the original. So we need to add a test in line 52 of ZlmFit.R for this condition.
Hi David,
I am using MAST for ssRNA-Seq analysis. I got stuck while making specific contrasts using lrTest + Hypothesis and trying to get the logFC values of the DE genes of this comparison. I tried to extract the respective values using getLogFC, but this function seems quite confusing to me and I do not really get the logic behind it. It also gives out all possible contrasts of the data set, although contrast0 and contrast1 should be there in order to specify the contrast of interest, isn´t it?
Is there any way how to get logFC values directly from the lrTest generated product? Or alternatively, could you explain the logics behind the getLogFC? I would be very thankful since I cannot find such examples in the vignettes or anywhere else
Thanks a lot! Cheers! Zane
@kliesis
Could you please make your question more precise? In what way is the getLogFC()
function not performing the expected behavior? Do you have some code and output that you can post to make your question more precise? Did you read the documentation? Did you look at the example in the documentation? What are you actually trying to achieve?
In the Details
of ?getLogFC
, it's made clear what the function is doing.
Rephrasing: getLogFC
computes the expected log fold change between two model contrasts (i.e. difference between model coefficients) from the discrete and continuous components of the model. The variance of the estimate is calculated by the delta method. The output can only be interpreted as a log fold change if the data are modeled on the log scale. By default the function will return all contrasts compared against the intercept term.
The log fold changes are not based on the output of lrTest but on the model coefficients. The statistics output by lrTest are chi-square statistics for the likelihood ratio test. They reflect the goodness of fit of nested models. So, no it is not possible to compute the log fold change from those statistics, since they don't reflect the expression level of a gene. That's what the getLogFC
function is for.
I can't be more helpful becauseI don't know anything about the model you are fitting or what your expectations are. You haven't provided any details beyond what I can only interpret as "I don't understand how this works".
Greg
Hi Zane,
More specifically:
Can you
1) provide the model that you are fitting (call to zlm
), and
2) the Hypothesis
that you are testing?
Note that you can provide contrasts1
and contrasts0
using Hypothesis
:
example(getLogFC)
## Only Test PopulationVbetaResponsive
getLogFC(zz, contrast1=Hypothesis("`PopulationVbetaResponsive`+`(Intercept)`"))
Hi guys,
first of all, thanks for the fast replies. And I am sorry for not providing any details, this is the first time that I have ever written a comment here myself. But these first two answers already helped me at least to some degree, so thanks!
I am testing whether 2 reagents influence gene expression in a particular cell line at 3 different time frames.
The fitted model is simply ~condition + cngeneson, where condition consists of 6 different groups (each of reagent at each of the time points) and the control group, which I set to be the intercept.
Currently, I am trying to compare the reagents to each other at each time point. So for the likelihood ratio test I used this command:
r1_vs_r2_tp4<-lrTest(zlm, Hypothesis('r1_tp4
- r2_tp4
'))
Thanks, Greg, I know that logFC is not calculated using lrTest statistics, but I was looking for a command that would basically do both for the same comparison, since in other packages this is usually possible.
I guess, what confuses me is the fact that one has to provide the contrast of interest and the baseline contrast, and I don´t quite get these following commands that were given as an example:
coefnames <- colnames(coef(zlm, 'D')) contrast0 <- setNames(rep(0, length(coefnames)), coefnames) contrast0[c("r1_tp4", "r2_tp4")] <- 1 contrast1 <- diag(length(coefnames)) rownames(contrast1)<-colnames(contrast1)<-coefnames contrast1["r1_tp4"]<-1 contrast0 contrast1 logFC_tp4<-getLogFC(zlm, contrast0,contrast1)
Cause then I get all possible contrasts. I really do not get the meaning of the baseline contrast, so I guess that is my biggest issue.
@ amcdavid: thanks, if I am using something like
logFC_tp4<-getLogFC(zlm,contrast1=Hypothesis('r1_tp4 - r2_tp4'))
the results seem to make more sense. Is this then the right way how I should use this function in my case? Letting then the contrast0 just be the intercept?
Thank you a lot! Zane
The contrast0 vs contrast1 business is because of the two-part model. I agree it's a bit convoluted, but there's not any way around it (at least with the current way the model is parametrized..I am in the process of experimenting with easier parameterizations.) . Note in your case, you can get all of this with a one liner:
summary(zlmfit, doLRT=TRUE)
See ?summary,ZlmFit-method for details. Otherwise, if you want to do it manually, you should just run the defaults with getLogFC, which as the help indicates:
contrast1: matrix of coefficients giving comparison contrasts, or a ‘Hypothesis’. If missing, then all non-(Intercept) coefficients are compared.
but how can one do a direct comparison between 2 groups of which NONE is the intercept only using summary function? if I extract the lrt$datatable then under the column "contrast" there are single comparisons of the single groups vs. intercept, but not between the different non-intercept groups of interest. so can I really get all possible comparisons only with this one liner?
I think I understand. You want all pairwise comparison of groups. In this case, it will be simpler to drop the intercept so that you estimate the means per group.
library(MAST)
data(vbetaFA)
#The special characters in the labels can cause problems with the following
fct = factor(colData(vbetaFA)$Population)
levels(fct) = LETTERS[1:6]
colData(vbetaFA)$Population = fct
#take CDR z-scores
colData(vbetaFA)$cdr = scale(colSums(assay(vbetaFA)>0))
#Or set these manually as a matrix
#No intercept, coefficients give the per group estimates
zfit = zlm( ~Population + cdr+0, data=vbetaFA)
#Everything except CDR
coefLevels = colnames(coef(zfit, 'D'))
popsToCompare = setdiff(coefLevels, 'cdr')
popPairs = combn(popsToCompare, 2)
lr_out = lfc_out = list()
for(j in 1:ncol(popPairs)){
h_test = Hypothesis(paste0(popPairs[1,j], '-', popPairs[2,j]))
lr_out[[j]] = lrTest(zfit, h_test)
#Workaround for a bug with single-entry contrasts
cnt0 = t(Hypothesis(popPairs[1,j], terms=coefLevels)@contrastMatrix)
cnt1 = Hypothesis(popPairs[2,j], terms=coefLevels)@contrastMatrix
lfc_out[[j]] = getLogFC(zfit, contrast0=cnt0, contrast1=cnt1) #you will get NAs for subsets where either one or the other group had no detectable cells
}
Thank you, I couldn't figure out how to do this either.
THANK YOU SO MUCH, THIS IS EXACTLY WHAT I NEEDED! awesome!
and sorry, I guess the way how I expressed myself was not the best one, I am still quite new to this field. Cheers!
zz <- zlm(~run+PatientID+CDR, data=SET, method='bayesglm', ebayes=TRUE, hook=deviance_residuals_hook)
lrRun <- lrTest(zz, 'run') ##looks whack