Closed tomlue closed 10 months ago
A possible route is to use RDF HDT format to create locally accessible graphs: https://www.rdfhdt.org/what-is-hdt/.
This can then be either used directly for queries on that particular graph using a query API such as:
or loaded into any graph database. An RDF DB would be easiest, but also a property graph DB provided a schema can be supplied (this may be where Biolink can come in):
Re: Biolink and connecting with #2 and #3,
Just wanted to note this here since it will come up when these graphs are merged:
While I was specifically looking at ctdbase
, I noticed that the ctdbase$CTD_chemicals$ChemicalID
column is supposed to be
- ChemicalID (MeSH identifier)
per https://ctdbase.org/downloads/#allchems, but as I was trying to characterise which MeSH terms are used, I noticed "MESH:D" in the data:
MeSH:D
(if it existed in MeSH) would not be an instance of ChemicalEntity itself, but the concept/class. So would many parent terms in the CTD Chemical Vocabulary. Again, will need to look at how Biolink deals with this.
CTD's description of their Chemical Vocabulary indicates that this isn't exactly the MeSH vocabulary, but modified in some places (MESH:D
is one example). I will have to see how other approaches deal with this such as https://robokop.renci.org/api-docs/docs/automat/ctd.
Mesh terms aren't a great approach for chemical identifier, if ctdbase doesn't have a closer to 1-1 mapping then it might be better to start with a different source. When we do integrate ctdbase, we will need to associate chemicals with their mesh term (or whatever identifier ctdbase is using).
Pubchem annotations have this information, but crawling all of pubchem and being polite will take too long, and they don't have a bulk download of annotations. I have been trying to reach out to pubchem about this for some time (https://twitter.com/pubchem/status/1686056545337917441), but I can increase my efforts. There is a pubchem brick right now, but it has bioassay and chemical sdf data, and only a small subset of the annotation data.
I'm open to any solutions you have, just moving along to other assets might be best. ICE and chembl are probably good choices.
@zmughal Toxicokinetics was a highlighted topic at eurotox. John Wambaugh talked about how lack of data is a big concern in that space. The package HTTK currently distributes some toxicokinetics data, but I'm not sure where it comes from. Maybe we need an httk brick? Or a brick for the sources it pulls from? Adding toxicokinetics to OKG would probably be helpful for the tox community.
From the grant
Let's go a step farther and actually create those local knowledge graphs.