Closed dhimmel closed 8 years ago
Related to #8: General mutation-load does provide some ability to predict mutation status of TP53.
This isn't too surprising given TP53's role in controlling cell cycle checkpoints. Is this only true for genes in cell cycle checkpoint or DNA repair pathways, or is it also true for genes in proliferation pathways?
Is this only true for genes in cell cycle checkpoint or DNA repair pathways, or is it also true for genes in proliferation pathways?
@cgreene, I updated the notebook to evaluate performance of covariate-only classifiers for the 8 interesting mutations we've previously considered. Here is performance of the models with all covariates included:
So actually, TP53 is among the hardest to predict using only covariates. VHL which is highly disease-specific achieves a near-perfect AUROC. Therefore, without the disease or organ covariate, expression classifiers of VHL are likely just classifying kidney clear cell carcinoma / kidney tissue.
See this dataframe to get a general idea of covariate importance. Mutation load and disease type both seem important.
@dhimmel : Interesting! I can imagine that gene expression would clearly capture disease/organ, so if one usually had a mutation in a single gene that was relatively specific (e.g. VHL) I could imagine that creating a strong signal.
Creates an explore directory and README for this type of exploratory notebook.
See how well covariates (non-expression features) predict TP53 mutation.
Related to https://github.com/cognoma/machine-learning/issues/8: General mutation-load does provide some ability to predict mutation status of TP53.
Partially addresses https://github.com/cognoma/machine-learning/issues/21: Covariates are extracted from
samples.tsv
.