Closed gabrielodom closed 5 years ago
Three examples:
Notes from 25 September Meeting
four examples:
Continued below
@jamesban2015, please help me find (and clean if necessary) the KIRP TCGA RNAseq dataset and get the matching survival outcome and censoring info. Thanks
Continued Examples:
Find the overlap between the significant pathways returned by the copy-number pathway PCA and the significant pathways from the ovarian PNNL pathway PCA (overall survival). Repeat this for C2CP, CP:KEGG (C5GO), and Wikipathways.
Prediction. Use Colorectal cancer gene expression: https://xenabrowser.net/datapages/?dataset=TCGA.COADREAD.sampleMap%2FHiSeqV2&host=https%3A%2F%2Ftcga.xenahubs.net&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443 Phenotype: https://xenabrowser.net/datapages/?dataset=TCGA.COADREAD.sampleMap%2FCOADREAD_clinicalMatrix&host=https%3A%2F%2Ftcga.xenahubs.net&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443 Perform following steps:
i. Split the data into 50-50% testing and training
ii. Perform AES- or Supervised PCA, extract 1 PC from each pathway from training set
iii. Multiply the loadings from the training data by the pathway-specific testing design matrices to yield testing PCs
iv. Use the PCs extracted from the training data to train and cross-validate an elastic-net model (glmnet
: use defaults for CV). Store this model.
v. Predict the testing survival using the PCs from the testing data.
vi. Using the predicted test survival, compare the predicted survival to the true survival with a survival ROC curve.
Depends on Issue #35.
For example 4, try the following:
cv.glmnet()
SuperPCA_pVals()
.Issue #37 is now closed. Moving forward with this re-vamped analysis.
Completing the C5 analysis requires Issue #43 to be closed.
I've tested the prediction results for the independently-scaled training and test data: no performance increase. I've also tested the C5 pathway collection: no performance increase.
I've tried two sequences of alpha (0.1, 0.2, ..., 1; 0.01, 0.04, 0.09, 0.16, ..., 1). Smaller values of alpha yielded the "best" performance, but it was still abysmal.
Gabriel, can you provide some details of the performance? Is there a figure or markdown for the performance evaluation?
Completing the SuperPCA analysis requires Issue #44 to be closed.
@jamesban2015 See Rmarkdown and .html
reports in the Example Data/Xena Prediction Colorectal
directory.
Completing the SuperPCA analysis requires Issue #45 to be closed.
Results for Supervised PCA are in Example Data/Xena Prediction Colorectal/SuperPCA_Prediction3.html
. It's not good.
did you try predict the training data instead of testing data?
One issue was that I did not center and scale the test data before loading it on the PCs calculated from the training data. This is related to issue #37, which I've re-opened. Basically, even though I selected to not center and scale the training data, the internal PCA routine scaled the data anyway. After I fix that issue, I want to try with the raw training and test data for both AESPCA and SuperPCA.
The cox prediction does not return survival times directly: http://r.789695.n4.nabble.com/estimating-survival-times-with-glmnet-and-coxph-td4614225.html
Look at approaches 2 and 3 here: http://gaodoris.blogspot.com/2012/10/5-ways-to-estimate-concordance-index.html
That's because it doesn't make sense to measure how well a survival prediction is performing based on individual survival times. Predicting survival time is apparently difficult (if not impossible) in the CoxPH framework.
For example 3, compare C2 under SuperPCA and AESPCA. Look at the genes internal to the shared significant pathways for these two techniques. Can we tell a story?
For example 3, the shared genes are shown in Xena Multi-Omics Ovarian/Reports/summary_ovarian_multiomics.html
.
I think you're pulling out genes that exist in both copy number data and proteomics data. Could you pull out genes with non-zero coefficients in AES-PCA in both copy number and proteomics data? These would be the genes that contribute to pathway significance and the ones we are interested.
For example 2, the significant pathways are shown in Xena Interaction Kidney/Reports/KIRP_Sex_PC_Interaction.html
For @lxw391's comment on example 3: I've updated the multi-omics report to include the overlap of the genes from significant pathways which also had non-zero loadings. This is in Xena Multi-Omics Ovarian/Reports/summary_ovarian_multiomics.html
For the vignettes, include a section showing the user querying an online data repository for data. We don't want to include the KIRP, copy number, or ovarian PNNL data in the package itself unless we have to.
Move conversation to Issue #49.
Use data from Chen's original paper or placenta data. Get clarifying information from Steven.