gabrielodom / pathwayPCA

integrative pathway analysis with modern PCA methodology and gene selection
https://gabrielodom.github.io/pathwayPCA/
11 stars 2 forks source link

Supervised PCA function - null density estimation for small number of pathways #8

Open gabrielodom opened 6 years ago

gabrielodom commented 6 years ago

The estimation of the density of the null distribution (necessary for calculating the pathway $p$-values) depends on the number of pathways considered. While this is probably fine for 30+ pathways, the quality of this density estimate will be degraded for very small pathway sets (less than 15?).

We should add in a new function that permutes the response for each pathway, and treats that permuted pathway as a new pathway, to "fill out" the number of pathways. I don't know how many times we should do this, but I guess 100 at least.

gabrielodom commented 6 years ago

Create 5 copies of each pathway, with a parametric bootstrap response, to help creating the null distribution. How do we properly adjust for FDR when we would have 15-20 original pathways, but now 5 x 20 more noise pathways? We should only adjust the original $p$-values, and ignore the $p$-values from these new null (parametric bootstrap) pathways.

gabrielodom commented 6 years ago

Until we fix this, add a warning to the superPCA_pVals() function to the user to avoid using pathway collections with fewer than 30 pathways.