Closed saliksyed closed 6 years ago
Thanks for the information!
Here I found something interesting in GeneSummaries.tsv
:
1 https://civic.genome.wustl.edu/links/genes/1 ALK 238 ALK amplifications, fusions and mutations have been shown to be driving events in non-small cell lung cancer. While crizontinib has demonstrated efficacy in treating the amplification, mutations in ALK have been shown to confer resistance to current tyrosine kinase inhibitors. Second-generation TKI's have seen varied success in treating these resistant cases, and the HSP90 inhibitor 17-AAG has been shown to be cytostatic in ALK-altered cell lines. 2017-03-06 00:00:15 UTC
Here the column containing the description might be the only information we want to load. There are several pieces useful in this sentence, i.e. "driving events in non-small cell lung cancer", "confer resistance to current tyrosine kinase inhibitors", "HSP90 inhibitor 17-AAG has been shown to be cytostatic in ALK-altered cell lines". Do we want to parse them into relations in our database?
Absolutely, I think the order of operations is a little complex, we will at some point want proteins and drugs as nodes in our system... we are slowly moving towards that. Once we have this functionality we can generate edges with these relationships.
I think we start with just getting this CIVICDB metadata into the gene nodes, at some point we will want to create nodes with metadata for the transcripts (proteins)
What do you think? I think Yuen made a good point that having information about the proteins generated from a gene will be critical. Do you want to take charge of finding a good database of protein information?
Once we have protein names we can easily parse these descriptions and generate edge nodes.
@saliksyed Sure. I agree that getting the CIVICDB description into out database is a good starting point. I will continue to look into how other browsers expose the protein information.
Precision Oncology database: https://civicdb.org/releases