gabrielodom / pathwayPCA

integrative pathway analysis with modern PCA methodology and gene selection
https://gabrielodom.github.io/pathwayPCA/
11 stars 2 forks source link

AESPCA: all loadings are 0 #69

Closed gabrielodom closed 5 years ago

gabrielodom commented 5 years ago

Pathway 805 ("CTTTGT_LEF1_Q2") of the msigdb pathway collection (per Steven's test) has all loadings and sample-PC estimates equal to 0. This causes:

gabrielodom commented 5 years ago

These are the paths where this happens:

"CTTTGT_LEF1_Q2", "KINSEY_TARGETS_OF_EWSR1_FLII_FUSION_UP", "NUYTTEN_EZH2_TARGETS_UP", "DANG_BOUND_BY_MYC", "PILON_KLF1_TARGETS_DN", "GO_CYTOSKELETON_ORGANIZATION", "GO_SMALL_MOLECULE_METABOLIC_PROCESS", "GO_TRANSCRIPTION_FROM_RNA_POLYMERASE_II_PROMOTER", "GO_POSITIVE_REGULATION_OF_MOLECULAR_FUNCTION", "GO_POSITIVE_REGULATION_OF_MULTICELLULAR_ORGANISMAL_PROCESS", "GO_OXIDATION_REDUCTION_PROCESS", "GO_POSITIVE_REGULATION_OF_PROTEIN_MODIFICATION_PROCESS", "GO_PROTEIN_COMPLEX_SUBUNIT_ORGANIZATION", "GO_PROTEOLYSIS", "GO_MICROTUBULE_CYTOSKELETON", "GO_NEURON_PART", "GO_NEURON_PROJECTION", "GO_KINASE_ACTIVITY", "GO_TRANSPORTER_ACTIVITY"

The number of genes in each of these paths (before filtering to match the data) is: 1972 1278 1037 1103 1972 838 1767 724 1791 1395 898 1135 1527 1208 1068 1265 942 842 1276

AESPCA may have an issue with very large pathways

gabrielodom commented 5 years ago

We think that AES-PCA has a greater chance to get "hung-up" on very large (> 700) pathways. Obviously we can't just prohibit large pathways, so we need an answer for when this might happen. I can add a stop in the PermTestSurv(), PermTestReg(), and PermTestCateg() functions to assign a $p$-value of 0 if the predictor matrix is all 0. What's the best way to do this?

sum(abs(PC)) < .Machine$double.eps

For even 20k pathways, this added serial computation takes 3 seconds.

gabrielodom commented 5 years ago

Added a check to return a $p$-value of 1 if all the PC-estimates are 0.