Open mellybelly opened 7 years ago
I everybody -- I am starting to understand how Dipper works thanks to your help. I think it would be good to have a discussion of how we want to set up the model. There are a few issues with data ingest such as the fact that OncoKB just has gene symbol and protein-mutation (e.g., A423P), and we would prefer to have the genomic coordinates as well (and perhaps a preferred transcript for the mutations). This is addressable but will require some finessing. I am hoping to extend the python hgvs to be able to go from "p." to "c."; Reece said that going from "c." to "g." is now functioning well. Another useful thing would be to add the cancer type to the OncoKB data by pulling it from the abstracts. I am wondering if it would be useful for us to get in contact with that group and propose they improve/extend their data model.
For dgidb, see ticket here: https://github.com/monarch-initiative/dipper/issues/446 And direct communication with dgidb team here: https://github.com/griffithlab/dgi-db/issues/141 and https://github.com/griffithlab/dgi-db/issues/142 As a side note: DGIdb includes CIViC, DoCM, MyCancerGenome
'Cancer variant databases' covers a broad and diverse domain. The summary below attempts to tease out some of the different datatypes to consider in this space, and begins a list of data sources to consider/prioritize. Others feel free to add/modify as needed - I think you can directly edit my comment here if you’d like.
Primary Variant 'Associations'
Additional Variant 'Metadata'
see this nice list: https://github.com/seandavi/awesome-cancer-variant-databases
related to #15, @dnahotline, @mbrush @stuppie @pnrobinson can you prioritize these and make a Dipper/Wikidata/BioThings plan? Would like most available for Monarch too, so good for joint ingest planning.