VALIS-software / SIRIUS-backend

A graph analytics engine for genomic data
MIT License
0 stars 1 forks source link

Add CivicDB database #78

Closed saliksyed closed 6 years ago

saliksyed commented 6 years ago

Precision Oncology database: https://civicdb.org/releases

yudongqiu commented 6 years ago

Thanks for the information!

Here I found something interesting in GeneSummaries.tsv:

1 https://civic.genome.wustl.edu/links/genes/1 ALK 238 ALK amplifications, fusions and mutations have been shown to be driving events in non-small cell lung cancer. While crizontinib has demonstrated efficacy in treating the amplification, mutations in ALK have been shown to confer resistance to current tyrosine kinase inhibitors. Second-generation TKI's have seen varied success in treating these resistant cases, and the HSP90 inhibitor 17-AAG has been shown to be cytostatic in ALK-altered cell lines. 2017-03-06 00:00:15 UTC

Here the column containing the description might be the only information we want to load. There are several pieces useful in this sentence, i.e. "driving events in non-small cell lung cancer", "confer resistance to current tyrosine kinase inhibitors", "HSP90 inhibitor 17-AAG has been shown to be cytostatic in ALK-altered cell lines". Do we want to parse them into relations in our database?

saliksyed commented 6 years ago

Absolutely, I think the order of operations is a little complex, we will at some point want proteins and drugs as nodes in our system... we are slowly moving towards that. Once we have this functionality we can generate edges with these relationships.

I think we start with just getting this CIVICDB metadata into the gene nodes, at some point we will want to create nodes with metadata for the transcripts (proteins)

What do you think? I think Yuen made a good point that having information about the proteins generated from a gene will be critical. Do you want to take charge of finding a good database of protein information?

Once we have protein names we can easily parse these descriptions and generate edge nodes.

yudongqiu commented 6 years ago

@saliksyed Sure. I agree that getting the CIVICDB description into out database is a good starting point. I will continue to look into how other browsers expose the protein information.