RGLab / MAST

Tools and methods for analysis of single cell assay data in R
224 stars 57 forks source link

lrTest(ZlmFIt, character) seems borked #57

Closed amcdavid closed 7 years ago

amcdavid commented 8 years ago

zz <- zlm(~run+PatientID+CDR, data=SET, method='bayesglm', ebayes=TRUE, hook=deviance_residuals_hook)

lrRun <- lrTest(zz, 'run') ##looks whack

amcdavid commented 8 years ago

This seems to occur when the original model was rank deficient, so the reduced model has the same column space as the original. So we need to add a test in line 52 of ZlmFit.R for this condition.

kliesis commented 7 years ago

Hi David,

I am using MAST for ssRNA-Seq analysis. I got stuck while making specific contrasts using lrTest + Hypothesis and trying to get the logFC values of the DE genes of this comparison. I tried to extract the respective values using getLogFC, but this function seems quite confusing to me and I do not really get the logic behind it. It also gives out all possible contrasts of the data set, although contrast0 and contrast1 should be there in order to specify the contrast of interest, isn´t it?

Is there any way how to get logFC values directly from the lrTest generated product? Or alternatively, could you explain the logics behind the getLogFC? I would be very thankful since I cannot find such examples in the vignettes or anywhere else

Thanks a lot! Cheers! Zane

gfinak commented 7 years ago

@kliesis Could you please make your question more precise? In what way is the getLogFC() function not performing the expected behavior? Do you have some code and output that you can post to make your question more precise? Did you read the documentation? Did you look at the example in the documentation? What are you actually trying to achieve?

In the Details of ?getLogFC, it's made clear what the function is doing. Rephrasing: getLogFC computes the expected log fold change between two model contrasts (i.e. difference between model coefficients) from the discrete and continuous components of the model. The variance of the estimate is calculated by the delta method. The output can only be interpreted as a log fold change if the data are modeled on the log scale. By default the function will return all contrasts compared against the intercept term.

The log fold changes are not based on the output of lrTest but on the model coefficients. The statistics output by lrTest are chi-square statistics for the likelihood ratio test. They reflect the goodness of fit of nested models. So, no it is not possible to compute the log fold change from those statistics, since they don't reflect the expression level of a gene. That's what the getLogFC function is for.

I can't be more helpful becauseI don't know anything about the model you are fitting or what your expectations are. You haven't provided any details beyond what I can only interpret as "I don't understand how this works".

Greg

amcdavid commented 7 years ago

Hi Zane, More specifically: Can you 1) provide the model that you are fitting (call to zlm), and 2) the Hypothesis that you are testing?

Note that you can provide contrasts1 and contrasts0 using Hypothesis:

example(getLogFC)
## Only Test PopulationVbetaResponsive
getLogFC(zz, contrast1=Hypothesis("`PopulationVbetaResponsive`+`(Intercept)`"))
kliesis commented 7 years ago

Hi guys,

first of all, thanks for the fast replies. And I am sorry for not providing any details, this is the first time that I have ever written a comment here myself. But these first two answers already helped me at least to some degree, so thanks!

I am testing whether 2 reagents influence gene expression in a particular cell line at 3 different time frames. The fitted model is simply ~condition + cngeneson, where condition consists of 6 different groups (each of reagent at each of the time points) and the control group, which I set to be the intercept. Currently, I am trying to compare the reagents to each other at each time point. So for the likelihood ratio test I used this command: r1_vs_r2_tp4<-lrTest(zlm, Hypothesis('r1_tp4 - r2_tp4'))

Thanks, Greg, I know that logFC is not calculated using lrTest statistics, but I was looking for a command that would basically do both for the same comparison, since in other packages this is usually possible.

I guess, what confuses me is the fact that one has to provide the contrast of interest and the baseline contrast, and I don´t quite get these following commands that were given as an example:

coefnames <- colnames(coef(zlm, 'D')) contrast0 <- setNames(rep(0, length(coefnames)), coefnames) contrast0[c("r1_tp4", "r2_tp4")] <- 1 contrast1 <- diag(length(coefnames)) rownames(contrast1)<-colnames(contrast1)<-coefnames contrast1["r1_tp4"]<-1 contrast0 contrast1 logFC_tp4<-getLogFC(zlm, contrast0,contrast1)

Cause then I get all possible contrasts. I really do not get the meaning of the baseline contrast, so I guess that is my biggest issue.

@ amcdavid: thanks, if I am using something like

logFC_tp4<-getLogFC(zlm,contrast1=Hypothesis('r1_tp4 - r2_tp4'))

the results seem to make more sense. Is this then the right way how I should use this function in my case? Letting then the contrast0 just be the intercept?

Thank you a lot! Zane

amcdavid commented 7 years ago

The contrast0 vs contrast1 business is because of the two-part model. I agree it's a bit convoluted, but there's not any way around it (at least with the current way the model is parametrized..I am in the process of experimenting with easier parameterizations.) . Note in your case, you can get all of this with a one liner:

summary(zlmfit, doLRT=TRUE)

See ?summary,ZlmFit-method for details. Otherwise, if you want to do it manually, you should just run the defaults with getLogFC, which as the help indicates:

contrast1: matrix of coefficients giving comparison contrasts, or a ‘Hypothesis’. If missing, then all non-(Intercept) coefficients are compared.

kliesis commented 7 years ago

but how can one do a direct comparison between 2 groups of which NONE is the intercept only using summary function? if I extract the lrt$datatable then under the column "contrast" there are single comparisons of the single groups vs. intercept, but not between the different non-intercept groups of interest. so can I really get all possible comparisons only with this one liner?

amcdavid commented 7 years ago

I think I understand. You want all pairwise comparison of groups. In this case, it will be simpler to drop the intercept so that you estimate the means per group.

library(MAST)
data(vbetaFA)
#The special characters in the labels can cause problems with the following
fct = factor(colData(vbetaFA)$Population)
levels(fct) = LETTERS[1:6]
colData(vbetaFA)$Population  = fct
#take CDR z-scores
colData(vbetaFA)$cdr = scale(colSums(assay(vbetaFA)>0))
#Or set these manually as a matrix
 #No intercept, coefficients give the per group estimates
zfit = zlm( ~Population + cdr+0, data=vbetaFA)
#Everything except CDR
coefLevels = colnames(coef(zfit, 'D'))
popsToCompare = setdiff(coefLevels, 'cdr')
popPairs = combn(popsToCompare, 2)
lr_out = lfc_out = list()
for(j in 1:ncol(popPairs)){
h_test = Hypothesis(paste0(popPairs[1,j], '-', popPairs[2,j]))
lr_out[[j]] = lrTest(zfit, h_test)
#Workaround for a bug with single-entry contrasts
cnt0 = t(Hypothesis(popPairs[1,j], terms=coefLevels)@contrastMatrix)
cnt1 = Hypothesis(popPairs[2,j], terms=coefLevels)@contrastMatrix
lfc_out[[j]] = getLogFC(zfit, contrast0=cnt0, contrast1=cnt1) #you will get NAs for subsets where either one or the other group had no detectable cells
}
roryk commented 7 years ago

Thank you, I couldn't figure out how to do this either.

kliesis commented 7 years ago

THANK YOU SO MUCH, THIS IS EXACTLY WHAT I NEEDED! awesome!

and sorry, I guess the way how I expressed myself was not the best one, I am still quite new to this field. Cheers!