GabrielHoffman / variancePartition

Quantify and interpret divers of variation in multilevel gene expression experiments
http://gabrielhoffman.github.io/variancePartition/
60 stars 14 forks source link

Questions about P.values #89

Closed Jayanth-Sreekanth closed 1 year ago

Jayanth-Sreekanth commented 1 year ago

Hi, thank you for the nice package.

I am running a mixed model similar to the one showed in the tutorial of variancePartition

  1. Though having multiple variables (Disease, Age and Sex), I am interested in looking at the individual P.values of each variable. For this, I am referring to p.value matrix in the output of dream function. My question is, are they adjusted.p.values or not?

  2. For all the genes, I notice the difference between P.Value in the topTable and F.p.value from the output of dream function. What's the reason behind it?

variancePartition version = 1.24.1

GabrielHoffman commented 1 year ago

If you fit a model with only flxed effects, then dream() should give you identical results to running lmFit() followed by `eBayes(). 1.24.1 is pretty old, so I might have fixed the issue since then.

If you send me a minimal reproducible example, I can see if there is a bug

Best, Gabriel

Jayanth-Sreekanth commented 1 year ago

Hi Gabriel,

I've shared a small reproducible example below.

As you can see, the p.value produced in the topTable and F.p.value in the output of dream function is slightly different. The differences between the p.value and F.p.value are sometimes large while working on other dataframes.

Why is this? Have you rectified this in the latest releases?

Also wanted to confirm that fit$p.value is not corrected for multiple testing right?

Regards, Jay

library(variancePartition)
#> Warning: package 'variancePartition' was built under R version 4.1.3
#> Loading required package: ggplot2
#> Warning: package 'ggplot2' was built under R version 4.1.3
#> Loading required package: limma
#> Warning: package 'limma' was built under R version 4.1.3
#> Loading required package: BiocParallel

data(varPartData)

form <- ~ Batch + (1|Individual) + (1|Tissue) 

fit = variancePartition::dream(geneExpr[1:100,], formula =  form, data = info)
#> Dividing work into 5 chunks...
#> 
#> Total:7 s
fit = variancePartition::eBayes(fit)

tt <- variancePartition::topTable(fit)
#> Removing intercept from test coefficients

p.val <- as.data.frame(fit$p.value)
p.val$F.p.value <- fit$F.p.value

tt$P.Value
#>  [1] 0.006912587 0.010287949 0.026494071 0.028667857 0.041205520 0.047782615
#>  [7] 0.068617212 0.084784797 0.086718583 0.099380955
p.val[rownames(tt),'F.p.value']
#>  [1] 0.006907668 0.010281684 0.026483626 0.028657013 0.041192802 0.047769136
#>  [7] 0.068601995 0.084768727 0.086702432 0.099364372
GabrielHoffman commented 1 year ago

When dream() is fit with only fixed effects, then it produces exactly the same result as limma::lmFit():

library(variancePartition)

data(varPartData)

form <- ~ Batch 
fit <- dream(geneExpr[1:100,], formula = form, data = info)
fit <- eBayes(fit)
topTable(fit, coef="Batch2", number=3)
#>            logFC    AveExpr         t    P.Value adj.P.Val         B
#> gene10  1.897490 -2.3673775  2.028912 0.04444815  0.973476 -4.580493
#> gene75 -1.692250 -0.4512581 -1.740355 0.08409242  0.973476 -4.585531
#> gene21  1.546069 -6.3790973  1.677117 0.09585092  0.973476 -4.586541

design = model.matrix(form, info)
fit2 <- lmFit(geneExpr[1:100,], design)
fit2 <- eBayes(fit2)
topTable(fit2, coef="Batch2", number=3)
#>            logFC    AveExpr         t    P.Value adj.P.Val         B
#> gene10  1.897490 -2.3673775  2.028912 0.04444815  0.973476 -4.580493
#> gene75 -1.692250 -0.4512581 -1.740355 0.08409242  0.973476 -4.585531
#> gene21  1.546069 -6.3790973  1.677117 0.09585092  0.973476 -4.586541

With a random effect is included (i.e. (1|Individual)), then dream() used a linear mixed model that is different. This is the target use case for dream(), the fixed-effect only case was included for convenience.

In your code above you are reading internal data from the fit object when you are using fit$F.p.value and fit$p.value. Avoid this. The only valid way to extract results from these tests is with topTable()

Gabriel

Jayanth-Sreekanth commented 1 year ago

Alright, thanks!