NKI-CCB / DISCOVER

DISCOVER co-occurrence and mutual exclusivity analysis for cancer genomics data
Apache License 2.0
28 stars 7 forks source link

p-value and q-values are the same for top hit #23

Open kw10 opened 1 year ago

kw10 commented 1 year ago

I am using the latest R version and I only have 1 mutually exclusive pair; the p and q values are the same. I am using fdr method DBH. I am wondering why the p and q values are the same? When I look at the next best result, the p-value and q-values are different (but not significant). For example:

number of pairs tested: 190
proportion of true null hypotheses: 1
number of significant pairs at a maximum FDR of 1 : 190
            gene1         gene2    p.value    q.value
39         GENEXXX        GENEYYY 0.08714019 0.08714019
42         GENEZZZ        GENEYYY 0.30103083 0.91313014

Thanks Kim

scanisius commented 1 year ago

The cases in which I have seen this were situations with limited statistical power due to low mutation frequencies. If you look at the number of mutated tumours for each gene, do you see that all genes except for GENEXXX and GENEYYY have low mutation frequencies?

kw10 commented 1 year ago

There are 41 samples total and one gene is mutated in 22/41 (53.7%) and the other gene is mutated in 4/41 (9.1%).

scanisius commented 1 year ago

If I understand you correctly, these are the mutation frequencies for GENEXXX and GENEYYY. And do all other genes have lower frequencies? What you are observing is most likely the result of low statistical power. The q.value estimate is correct. The intuitive explanation is that the multiple testing correction does not penalize your first gene pair, because none of the other gene pairs can attain a p value lower than 0.087, even if none of their mutations co-occur in the same tumours. The discrete Benjamini-Hochberg procedure takes this into account when estimating q values.

kw10 commented 1 year ago

Yes, all other genes have lower frequencies. Thanks for the explanation!