Open cmungall opened 6 years ago
CM: "We shouldn't be dependent on files checked into github."
Why not?
They can be fetched remotely or cloned locally
the cq notebook already depend quite heavily on GitHub
we can be sure the file is under 100M
We use it for the FA gene sets since the last hackathon.
not sure I see the point of the directive
We should limit this.
To expand on the last point, this notebook demonstrates how you can ask the explorer "starting from a drug name, how do I get to the snomed ID of a disease the drug denotes by the drug name treats?". This is good, but it would be more powerful if you could it go even further: really what we want here is the phenotypes, so we'd like to just say "drug names to phenotypes of treated disease" and have it figure everything out. I assume that is what @kevinxin90 is aiming for here, but for pragmatic reasons (ie the fact the perceived only way to do this is via files) the last part is done explicitly rather than automatically by the explorer.
As an aside, I think this also determines the need for relationship types. When we ask the explorer "connect drug names to snomed IDs", how do we know it is connecting diseases in snomed? there are also snomed IDs for drugs, and plenty of other things. It is only an accident that the APIs used happen to give the desired answers here. Curious as to @newgene and @kevinxin90's thoughts on this.
@cmungall For the first part, totally agreed. Ideally, we should directly connect from drug name to phenotype. However, the results from MyChem can't be directly linked to BioLink, given MyChem only outputs 'SNOMED', while BioLink only takes 'DOID'. It would be great if there is an additional API doing the work. For the second part, I think the question should be addressed by URI repos, e.g. 'identifiers.org' or 'prefixcommons'. A good example would be 'KEGG', it has information involving bio-entities like drugs, reaction, pathway, genes, etc. And 'identifiers.org' has a separate entry for each one, e.g. http://identifiers.org/kegg.compound/, http://identifiers.org/kegg.pathway/, http://identifiers.org/kegg.genes/. I think the same thing should be done with 'snomed' or 'omim' or 'ensembl', where one database contains information regarding multiple bio-entities.
Convert SNOMED ID to DOID using Drugcentral doid_xref file
Another option is wikidata, but snomed xrefs don't seem to be there @stuppie / @putmantime is this a licensing issue? I would have thought it ok to put xrefs in, just not any further content. Maybe wd playing it safe?
Unfortunately, I think this is a licensing issue as SNOMED has a very strict licence regarding everything explicitly including identifiers. See: https://www.wikidata.org/wiki/Wikidata:Property_proposal/sctid Also, pinging @andrewsu
@cmungall A crude workaround using an API: Get the xrefs for a given DOID using OLS: example
workaround
Thanks for the OLS example. Is it possible to go the opposite way? Or does that require OxO API?
Soon it should be possible to query for HPO phenotypes for a SNOMED ID from biolink directly (with the proviso that often the snomed term will be more general, e.g. less coverage of rare diseases). Of course, that takes some of the fun out of chaining APIs....
license
Thanks for the info.
@kevinxin90 and @newgene - have you considered using a different vocabulary than SNOMED in MyChem? We can discuss some other options this week. You could use mondo as part of your MyChem ingest pipeline to map.
For the second part, I think the question should be addressed by URI repos
So I think what you're suggesting is IDs like snomed.disease:nnn, snomed.drug:nnnn. I'm not a big fan of this approach, we can discuss more this week.
Yep, same for scigraph API: https://github.com/SciGraph/SciGraph/issues/248
Following on from discussion about overloading IDs, I see there is both a lot of really useful stuff in mychem, and it's really well described by jsonld - nice work!
E.g. this is where the snomed IDs come from (via drugcentral)
But it looks like you can also get indications from aeolus (using meddra IDs): http://mychem.info/v1/query?q=drugbank.name%3Ariluzole
it would be good to expand this notebook to include this, but I guess we need to expand the jsonld to describe this?
But here is my concern - if a database were to include contraindications, how would we annotate this in the jsonld? It seems the explorer approach just looks for some kind of path through the json from the query id to any id annotated with the relevant identifiers.org id. Without a predicate or relationship type connecting the two, we don't know what the semantics is. For the case of distinguishing drug->snomed equivalent
and drug->snomed indicated for
we can use different id prefixes to distinguish the two kinds of entities but I'm not sure this approach can be extended to distinguish indications from contraindications.
(this concern also applies to things like negative annotations in GO in mygene)
Look forward to discussing these issues this week!
https://github.com/NCATS-Tangerine/cq-notebooks/blob/master/Orange_QB2_Other_CQs/Drug_Repurpose_By_Pheno/BTExplorer-QB2.3.ipynb
Convert SNOMED ID to DOID using Drugcentral doid_xref file
We shouldn't be dependent on files checked into github.
I thought this would be possible in scigraph, but there is an annoying blocker: https://github.com/SciGraph/SciGraph/issues/248
Another option is wikidata, but snomed xrefs don't seem to be there: https://www.wikidata.org/wiki/Q206901
@stuppie / @putmantime is this a licensing issue? I would have thought it ok to put xrefs in, just not any further content. Maybe wd playing it safe?
Calculating phenotypic similarity
This can be done using the owlsim web API