Closed gabrielodom closed 5 years ago
These are the paths where this happens:
"CTTTGT_LEF1_Q2", "KINSEY_TARGETS_OF_EWSR1_FLII_FUSION_UP", "NUYTTEN_EZH2_TARGETS_UP", "DANG_BOUND_BY_MYC", "PILON_KLF1_TARGETS_DN", "GO_CYTOSKELETON_ORGANIZATION", "GO_SMALL_MOLECULE_METABOLIC_PROCESS", "GO_TRANSCRIPTION_FROM_RNA_POLYMERASE_II_PROMOTER", "GO_POSITIVE_REGULATION_OF_MOLECULAR_FUNCTION", "GO_POSITIVE_REGULATION_OF_MULTICELLULAR_ORGANISMAL_PROCESS", "GO_OXIDATION_REDUCTION_PROCESS", "GO_POSITIVE_REGULATION_OF_PROTEIN_MODIFICATION_PROCESS", "GO_PROTEIN_COMPLEX_SUBUNIT_ORGANIZATION", "GO_PROTEOLYSIS", "GO_MICROTUBULE_CYTOSKELETON", "GO_NEURON_PART", "GO_NEURON_PROJECTION", "GO_KINASE_ACTIVITY", "GO_TRANSPORTER_ACTIVITY"
The number of genes in each of these paths (before filtering to match the data) is: 1972 1278 1037 1103 1972 838 1767 724 1791 1395 898 1135 1527 1208 1068 1265 942 842 1276
AESPCA may have an issue with very large pathways
We think that AES-PCA has a greater chance to get "hung-up" on very large (> 700) pathways. Obviously we can't just prohibit large pathways, so we need an answer for when this might happen. I can add a stop in the PermTestSurv()
, PermTestReg()
, and PermTestCateg()
functions to assign a $p$-value of 0 if the predictor matrix is all 0. What's the best way to do this?
sum(abs(PC)) < .Machine$double.eps
For even 20k pathways, this added serial computation takes 3 seconds.
Added a check to return a $p$-value of 1 if all the PC-estimates are 0.
Pathway 805 ("CTTTGT_LEF1_Q2") of the msigdb pathway collection (per Steven's test) has all loadings and sample-PC estimates equal to 0. This causes:
numReps = 0
) with the estimation of the quantiles of the F distribution (all values are NAs)permAIC < trueAIC
will evaluate toFALSE
for all replicates. This means thatmean(permAIC < trueAIC)
will be exactly 0.