Open DnlRKorn opened 11 months ago
Review at DMOG 4/17
We don't currently have a real MONDO parser, we get everything we get from MONDO from Ubergraph.
The only thing we do get from MONDO is node properties, the idea being to include exactly the kind of thing Dan wanted, but it really needs to be completely refactored because:
The MONDOProps parser was written before we completely refactored the other Ubergraph parsers. It's very inefficient. It doesn't have real Ubergraph versioning, which is available in the real Ubergraph parser/utils, it just uses a modify date.
We need to review the syntax of these properties because right now it's a made up thing where we take the MONDO designation and turn it into something like this: {"MONDO_SUPERCLASS_rare": True}.
From a quick glance it looks like we only get one "rare" disease (MONDO:0021136). This seems odd and might be due to a bug in the current parsing technique.
We do actually have other SPARQL queries in the Ubergraph tools and we could probably use the queries mentioned above to do this much more efficiently. We should also consider whether the MONDO properties should just be a part of Ubergraph (but this means you couldn't easily apply them to other graphs) and if we want other stuff from MONDO.
1) Improve MONDO parser 2) Talk to Jim 3) "rare" disease designation is in custom format and not biolink term.
More specifically we should:
There was a blog post from MONARCH detailing how to parse the MONDO data to get "rare" disease classification https://mondo.monarchinitiative.org/pages/analysis/
Jim Balhoff rewrote this blog post as a series of SparQL queries for UBERGRAPH
Need to figure out smart way to run SparQL queries when running ORION and integrating this info into the final graph.
Also open question of where this should live, as part of MONDO parsing or elsewhere.