Closed Gambrian closed 3 weeks ago
You may want to rephrase your question.
By converting the results into a data.frame filtering is applied; first on (adjusted) p-values, and then on q-value. This is highlighted in https://github.com/YuLab-SMU/clusterProfiler/issues/737.
This filtering is automagically done, because the filtering function get_enriched()
is being used when converting the results of an enrichment analysis into a data.frame. See the code definitions in accessor.R
(https://github.com/YuLab-SMU/DOSE/blob/4bdf261db7e2d088ca588a564f1db5a85ba522b7/R/accessor.R#L1)
In the first part of your code (go_res %>% as.data.frame()
) filtering is thus based on fdr
only (i.e. fdr smaller or equal 0.1, and fdr only because qvalueCutoff = 1
). Note that fdr
results are present in the column p.adjust
.
https://github.com/YuLab-SMU/DOSE/blob/4bdf261db7e2d088ca588a564f1db5a85ba522b7/R/enricher_internal.R#L147
By using the @ accessor, you directly pull out the raw c.q. all results.
Note that next you did filter for pvalue
, and not p.adjust
(the latter corresponds to the fdr)! If you should have filtered on p.adjust
you will see that indeed the same / only 2 sets 'survive' as when extracting the results as data.frame (because all other sets have p.adjust
> 0.1, i.e. 0.181, 0.26 etc).
Finally, by reading the help page for p.adjust (type: ?p.adjust
) you will see:
".... Benjamini & Hochberg (1995) ("BH" or its alias "fdr") ...".
Thus setting the method to fdr
the Benjamini-Hochberg method for adjusting for multiple testing will be used. Thus fdr = BH.
Thanks for your reply, I think I made a mistake because I didn't see any description about pvalueCutoff
pvalueCutoff | adjusted pvalue cutoff on enrichment tests to report
so if I got some results and I want to filter by pvalue , I need to set pAdjustMethod to "none", right ? and I have another question about qvalue
> go_res = enricher(gene = target_id, + TERM2GENE = go_df[,c("go_id","gene")], + TERM2NAME = go_df[,c("go_id","go_description")], + pvalueCutoff = 0.05, + pAdjustMethod = "none", + qvalueCutoff = 1, + minGSSize = 10, + maxGSSize = 500)
> qvalue(go_res@result$pvalue)$qvalue %>% head()
[1] 0.0001686506 0.0096398217 0.0231162575 0.0231162575 0.0231162575 0.0343386871
> go_res@result$qvalue %>% head()
[1] 0.001281166 0.073229585 0.175604280 0.175604280 0.175604280 0.260856257
> packageVersion("qvalue")
[1] ‘2.36.0’
I want to know what causes this difference
Yes, if pAdjustMethod = "none"
, indeed no p-value adjustment for multiple testing will be applied. Values in columns pvalue
and p.adjust
will be identical.
Regarding gene set filtering: first values in in column pvalue
will be used to select gene sets, 'surviving' sets will then be filtered for p.adjust
(thus using the same cutoff value!), and finally surviving sets will be filtered for qvalue
cutoff (default setting is qvalueCutoff = 0.2
).
On the help page of enrichGO:
qvalueCutoff - qvalue cutoff on enrichment tests to report as significant. Tests must pass i) pvalueCutoff on unadjusted pvalues, ii) pvalueCutoff on adjusted pvalues and iii) qvalueCutoff on qvalues to be reported.
The code doing this is thus the internal function get_enriched
: https://github.com/YuLab-SMU/DOSE/blob/4bdf261db7e2d088ca588a564f1db5a85ba522b7/R/enricher_internal.R#L213-L234
Regarding the calculation of qvalues
:
That is done in this line in the function enricher_internal
: https://github.com/YuLab-SMU/DOSE/blob/4bdf261db7e2d088ca588a564f1db5a85ba522b7/R/enricher_internal.R#L148
Values can be fully reproduced by manual calculation. See below.
> library(clusterProfiler)
>
> ## load example data
> data(geneList, package = "DOSE")
> de <- names(geneList)[1:100]
>
> ## default analysis
> y1 <- enrichGO(de, 'org.Hs.eg.db',
+ ont = "CC",
+ pvalueCutoff = 0.05,
+ pAdjustMethod = "BH",
+ qvalueCutoff = 0.2)
>
>
> ## Manual calculation of qvalues
> ## note that both pvalueCutoff and qvalueCutoff are set to 1,
> ## because ALL pvalues have to be used for qvalue calculations!
> library(qvalue)
>
> y2 <- enrichGO(de, 'org.Hs.eg.db',
+ ont = "CC",
+ pvalueCutoff = 1,
+ pAdjustMethod = "none",
+ qvalueCutoff = 1)
>
> qvalues <- qvalue(p=y2@result$pvalue, lambda=0.05, pi0.method="bootstrap")
>
> ## check and compare
> ## qvalues are identical!
>
> as.data.frame(y1)[1:6, "qvalue"]
[1] 8.448492e-17 8.448492e-17 8.448492e-17 9.585565e-17 2.477193e-16
[6] 8.026847e-16
>
> head(qvalues$qvalues)
[1] 8.448492e-17 8.448492e-17 8.448492e-17 9.585565e-17 2.477193e-16
[6] 8.026847e-16
>
>
>
Thank you for your patient answer
I'm wondering how the enricher filters the results by these cutoffs, I also saw issue #737 but I think I'm encountering a bug rather than a feature