Closed joshlevy89 closed 7 years ago
I'm new to the group so let me know if there is a better place to write this kind of thing...
Nope Issues are the right place. I'm going to tag a few related issues for convenience: https://github.com/cognoma/machine-learning/issues/8, https://github.com/cognoma/machine-learning/issues/21, https://github.com/cognoma/machine-learning/pull/47.
See this notebook from #47 which looks at performance for several mutations only using the covariates (metadata). So I think the next step based on what currently exists will be find a way to fit two models:
Then seeing how much better 2 performs will give us the marginal contribution of gene expression over sample metadata. @joshlevy89, do you want to tackle this analysis. You can make a new directory in explore
and open a pull request (even if it's still a work in progress -- just put WIP in the pull request title).
Cheers!
@dhimmel Thanks for the reply. That sounds good. I can tackle it in the next couple of days.
I think this was closed by #67.
I'm new to the group so let me know if there is a better place to write this kind of thing...
I am working on assessing whether the gene expression data provides considerably more predictive information than the metadata (samples.tsv). I created a notebook to predict TP53 mutation from the metadata alone and achieved ~.82 AUROC. This is substantially lower than the AUROC achieved using gene expression (~.92). I have a few other ideas for what to do next, but am interested in any input. The new notebook can be found on my forked repo (4.TCGA-Metadata-MLexample). Have not submitted a pull request.