TP53 mutation prediction from metadata

joshlevy89 commented 8 years ago

I'm new to the group so let me know if there is a better place to write this kind of thing...

I am working on assessing whether the gene expression data provides considerably more predictive information than the metadata (samples.tsv). I created a notebook to predict TP53 mutation from the metadata alone and achieved ~.82 AUROC. This is substantially lower than the AUROC achieved using gene expression (~.92). I have a few other ideas for what to do next, but am interested in any input. The new notebook can be found on my forked repo (4.TCGA-Metadata-MLexample). Have not submitted a pull request.

dhimmel commented 8 years ago

I'm new to the group so let me know if there is a better place to write this kind of thing...

Nope Issues are the right place. I'm going to tag a few related issues for convenience: https://github.com/cognoma/machine-learning/issues/8, https://github.com/cognoma/machine-learning/issues/21, https://github.com/cognoma/machine-learning/pull/47.

See this notebook from #47 which looks at performance for several mutations only using the covariates (metadata). So I think the next step based on what currently exists will be find a way to fit two models:

using covariates only
using covariates and gene expression

Then seeing how much better 2 performs will give us the marginal contribution of gene expression over sample metadata. @joshlevy89, do you want to tackle this analysis. You can make a new directory in explore and open a pull request (even if it's still a work in progress -- just put WIP in the pull request title).

Cheers!

joshlevy89 commented 8 years ago

@dhimmel Thanks for the reply. That sounds good. I can tackle it in the next couple of days.

rdvelazquez commented 7 years ago

I think this was closed by #67.

cognoma / machine-learning

TP53 mutation prediction from metadata #66