Closed dhimmel closed 8 years ago
BRAF
should segregate to melanoma and subsets of lung cancer
BRAFV600E should be a good test for the machine learning group once we get the columns mentioned in #16
Can also visualize BRCA1
and BRCA2
- will largely segregate into breast and subsets of ovarian, cervical, and uterine cancers as well.
Can you also add ALK
- should segregate into subsets of lung cancer. ALK is interesting because it is inactivated usually by chromosomal rearrangements and I suspect a gene expression signature for ALK inactivation could be interesting
You can also look at MEN1 and RET, genes which is associated with a lot of neuroendocrine things (pancreas, pituitary, parathyroid, medullary thyroid, pheochromocytoma)
Are you interested in genes associated with cancers in general, or genes where we might expect that the majority of cancers segregate with a single gene?
Are you interested in genes associated with cancers in general, or genes where we might expect that the majority of cancers segregate with a single gene?
@linzho both. Since this is an exploratory analysis, I'm just looking to look!
@linzho & @gwaygenomics thanks for your suggestions. I added them to the heatmap in 29c926ab3de9e8a7b95b79ac582e295ffc5f41f3, which now looks like this:
I also scaled the mutation rates for each gene by the max mutation rate. Note that there is still the outstanding issue that some diseases harbor more mutations (see row-wise bands above & https://github.com/cognoma/machine-learning/issues/8).
would it be useful to add functionality to the script? if the final output is the mutation by tissue heatmap could you add an argparse
argument? So the above graph would be generated like:
python scripts/3.explore-mutations.py --gene-list "BRCA2,ALK,CD274,MEN1,VHL,RET,TP53,BRCA1"
just a thought
@gwaygenomics I have a slightly different philosophy here.
scripts/3.explore-mutations.py
is an auto-exported script version of the notebook for diff viewing. So all code changes should be done to the notebook. Passing args to the notebook doesn't make sense because you should be able to use notebooks interactively.
So one option is to create a python module, e.g. heatmap.py
which has a function that 3.explore-mutations.ipynb
would call and has a __main__
that could enable script execution. However, I don't really see a major benefit that justifies the added complexity. If you want to add more genes, you can just open the notebook and add genes to the dictionary.
IMO, notebooks are better than scripts with arguments for agile data science.
got it - i agree for this script.
Although I do think that moving towards this philosophy in terms of thinking about functionality for how a user will visualize input genes and input tissues (i.e. the frontend/cancer data discussion yesterday - see cognoma/frontend#12) will be important.
LGTM :+1:
This pull request is based on a preliminary notebook we created at the 2016-08-23 Cognoma Meetup. Tagging @mike1906 @stephenshank, @drolejoel, @linzho, who were part of this group (we'd love your feedback).
Specifically, I'd like feedback on interested cancer genes where we expect to see mutation status segregate with disease. For example, the present notebook shows the enrichment of VHL for kidney clear cell carcinoma.