NCATS-Tangerine / cq-notebooks

Notebooks for answering competency questions
6 stars 13 forks source link

Drug repurposing: use APIs for DOID-SNOMED traversal and for hposim #57

Open cmungall opened 6 years ago

cmungall commented 6 years ago

https://github.com/NCATS-Tangerine/cq-notebooks/blob/master/Orange_QB2_Other_CQs/Drug_Repurpose_By_Pheno/BTExplorer-QB2.3.ipynb

Convert SNOMED ID to DOID using Drugcentral doid_xref file

We shouldn't be dependent on files checked into github.

I thought this would be possible in scigraph, but there is an annoying blocker: https://github.com/SciGraph/SciGraph/issues/248

Another option is wikidata, but snomed xrefs don't seem to be there: https://www.wikidata.org/wiki/Q206901

@stuppie / @putmantime is this a licensing issue? I would have thought it ok to put xrefs in, just not any further content. Maybe wd playing it safe?

Calculating phenotypic similarity

This can be done using the owlsim web API

TomConlin commented 6 years ago

CM: "We shouldn't be dependent on files checked into github."

Why not?
They can be fetched remotely or cloned locally
the cq notebook already depend quite heavily on GitHub we can be sure the file is under 100M We use it for the FA gene sets since the last hackathon.

not sure I see the point of the directive

cmungall commented 6 years ago

We should limit this.

cmungall commented 6 years ago

To expand on the last point, this notebook demonstrates how you can ask the explorer "starting from a drug name, how do I get to the snomed ID of a disease the drug denotes by the drug name treats?". This is good, but it would be more powerful if you could it go even further: really what we want here is the phenotypes, so we'd like to just say "drug names to phenotypes of treated disease" and have it figure everything out. I assume that is what @kevinxin90 is aiming for here, but for pragmatic reasons (ie the fact the perceived only way to do this is via files) the last part is done explicitly rather than automatically by the explorer.

As an aside, I think this also determines the need for relationship types. When we ask the explorer "connect drug names to snomed IDs", how do we know it is connecting diseases in snomed? there are also snomed IDs for drugs, and plenty of other things. It is only an accident that the APIs used happen to give the desired answers here. Curious as to @newgene and @kevinxin90's thoughts on this.

kevinxin90 commented 6 years ago

@cmungall For the first part, totally agreed. Ideally, we should directly connect from drug name to phenotype. However, the results from MyChem can't be directly linked to BioLink, given MyChem only outputs 'SNOMED', while BioLink only takes 'DOID'. It would be great if there is an additional API doing the work. For the second part, I think the question should be addressed by URI repos, e.g. 'identifiers.org' or 'prefixcommons'. A good example would be 'KEGG', it has information involving bio-entities like drugs, reaction, pathway, genes, etc. And 'identifiers.org' has a separate entry for each one, e.g. http://identifiers.org/kegg.compound/, http://identifiers.org/kegg.pathway/, http://identifiers.org/kegg.genes/. I think the same thing should be done with 'snomed' or 'omim' or 'ensembl', where one database contains information regarding multiple bio-entities.

stuppie commented 6 years ago

Convert SNOMED ID to DOID using Drugcentral doid_xref file

Another option is wikidata, but snomed xrefs don't seem to be there @stuppie / @putmantime is this a licensing issue? I would have thought it ok to put xrefs in, just not any further content. Maybe wd playing it safe?

Unfortunately, I think this is a licensing issue as SNOMED has a very strict licence regarding everything explicitly including identifiers. See: https://www.wikidata.org/wiki/Wikidata:Property_proposal/sctid Also, pinging @andrewsu

stuppie commented 6 years ago

@cmungall A crude workaround using an API: Get the xrefs for a given DOID using OLS: example

cmungall commented 6 years ago

workaround

Thanks for the OLS example. Is it possible to go the opposite way? Or does that require OxO API?

Soon it should be possible to query for HPO phenotypes for a SNOMED ID from biolink directly (with the proviso that often the snomed term will be more general, e.g. less coverage of rare diseases). Of course, that takes some of the fun out of chaining APIs....

license

Thanks for the info.

cmungall commented 6 years ago

@kevinxin90 and @newgene - have you considered using a different vocabulary than SNOMED in MyChem? We can discuss some other options this week. You could use mondo as part of your MyChem ingest pipeline to map.

For the second part, I think the question should be addressed by URI repos

So I think what you're suggesting is IDs like snomed.disease:nnn, snomed.drug:nnnn. I'm not a big fan of this approach, we can discuss more this week.

stuppie commented 6 years ago

Is it possible to go the opposite way?

Not that I can see. Here are the docs. Closest I can get is through search.

cmungall commented 6 years ago

Yep, same for scigraph API: https://github.com/SciGraph/SciGraph/issues/248

cmungall commented 6 years ago

Following on from discussion about overloading IDs, I see there is both a lot of really useful stuff in mychem, and it's really well described by jsonld - nice work!

E.g. this is where the snomed IDs come from (via drugcentral)

https://github.com/NCATS-Tangerine/translator-api-registry/blob/master/mychem.info/jsonld_context/mychem_drug_1.1.json#L135-L147

But it looks like you can also get indications from aeolus (using meddra IDs): http://mychem.info/v1/query?q=drugbank.name%3Ariluzole

it would be good to expand this notebook to include this, but I guess we need to expand the jsonld to describe this?

But here is my concern - if a database were to include contraindications, how would we annotate this in the jsonld? It seems the explorer approach just looks for some kind of path through the json from the query id to any id annotated with the relevant identifiers.org id. Without a predicate or relationship type connecting the two, we don't know what the semantics is. For the case of distinguishing drug->snomed equivalent and drug->snomed indicated for we can use different id prefixes to distinguish the two kinds of entities but I'm not sure this approach can be extended to distinguish indications from contraindications.

(this concern also applies to things like negative annotations in GO in mygene)

Look forward to discussing these issues this week!