Evaluate performance of covariates at predicting various mutations

dhimmel commented 8 years ago

Creates an explore directory and README for this type of exploratory notebook.

See how well covariates (non-expression features) predict TP53 mutation.

Related to https://github.com/cognoma/machine-learning/issues/8: General mutation-load does provide some ability to predict mutation status of TP53.

Partially addresses https://github.com/cognoma/machine-learning/issues/21: Covariates are extracted from samples.tsv.

cgreene commented 8 years ago

Related to #8: General mutation-load does provide some ability to predict mutation status of TP53.

This isn't too surprising given TP53's role in controlling cell cycle checkpoints. Is this only true for genes in cell cycle checkpoint or DNA repair pathways, or is it also true for genes in proliferation pathways?

dhimmel commented 8 years ago

Is this only true for genes in cell cycle checkpoint or DNA repair pathways, or is it also true for genes in proliferation pathways?

@cgreene, I updated the notebook to evaluate performance of covariate-only classifiers for the 8 interesting mutations we've previously considered. Here is performance of the models with all covariates included:

covariate-performance

So actually, TP53 is among the hardest to predict using only covariates. VHL which is highly disease-specific achieves a near-perfect AUROC. Therefore, without the disease or organ covariate, expression classifiers of VHL are likely just classifying kidney clear cell carcinoma / kidney tissue.

See this dataframe to get a general idea of covariate importance. Mutation load and disease type both seem important.

cgreene commented 8 years ago

@dhimmel : Interesting! I can imagine that gene expression would clearly capture disease/organ, so if one usually had a mutation in a single gene that was relatively specific (e.g. VHL) I could imagine that creating a strong signal.

cognoma / machine-learning

Evaluate performance of covariates at predicting various mutations #47