RGLab / MAST

Tools and methods for analysis of single cell assay data in R
224 stars 57 forks source link

lrTest's with CoefficientHypothesis and contrast matrix give inconsistent results #178

Open ImNotaGit opened 1 year ago

ImNotaGit commented 1 year ago

This arises from the question asked here and I was asked to open a issue on GitHub so it can be investigated further. The issue is literally described in the title, below is a toy example.

library(MAST)

# I use an arbitrary subset of the shipped vbetaFA datasets only for testing purpose:
data(vbetaFA)
dat=subset(vbetaFA, ncells==1 & Population %in% c("VbetaResponsive","VbetaUnresponsive"))[1:10,]

# check the data:
table(colData(dat)$Stim.Condition, colData(dat)$Population)
#            VbetaResponsive VbetaUnresponsive
#  Stim(SEB)              43                43
#  Unstim                 42                43

# check the design matrix for the names of model coefficients:
colnames(model.matrix(~Stim.Condition*Population, colData(dat)))
# [1] "(Intercept)"       "Stim.ConditionUnstim"       "PopulationVbetaUnresponsive"
# [4] "Stim.ConditionUnstim:PopulationVbetaUnresponsive"

# fit model
fit=zlm(~Stim.Condition*Population, dat)

# test the interaction effect with `lrTest`, passing either a contrast matrix or a `CoefficientHpothesis`, here corresponding to just one coefficient. I'm expecting that both ways give the same results.

# 1. with a contrast matrix:
lrt1=lrTest(fit, as.matrix(c(0,0,0,1)))
head(lrt1[,,3])
#         test.type
# primerid      cont      disc    hurdle
#   B3GAT1 1.0000000 0.1349874 0.1349874
#   BAX    0.9235851 0.8054454 0.9656696
#   BCL2   1.0000000 0.4718240 0.4718240
#   CCL2   1.0000000 0.3383300 0.3383300
#   CCL3   1.0000000 1.0000000 1.0000000
#   CCL4   1.0000000 1.0000000 1.0000000

# 2. using `CoefficientHypothesis`:
lrt2=lrTest(fit, CoefficientHypothesis("Stim.ConditionUnstim:PopulationVbetaUnresponsive"))
head(lrt2[,,3])
#         test.type
# primerid      cont      disc    hurdle
#  B3GAT1 1.0000000 0.5045526 0.5045526
#  BAX    0.9235851 0.8058111 0.9657819
#  BCL2   1.0000000 0.4259263 0.4259263
#  CCL2   1.0000000 0.7140869 0.7140869
#  CCL3   1.0000000 1.0000000 1.0000000
#  CCL4   1.0000000 1.0000000 1.0000000

# so as seen above the two results are different for the discrete component and thus the Hurdle model, but the continuous component appears to be consistent.