ROBOKOP needs to store properties alongside identifiers; this might be something that is useful for other users to have as well. Since Babel needs to read through the input files anyway, it might make sense for those properties to be extracted in Babel and then exposed through NodeNorm.
Plan
(Babel) Extract properties from resources (e.g. CHEBI) and store them alongside the IDs in the Babel outputs
We will need to figure out how to handle provenance and collisions: what if two data sources disagree on the molecular weight of a molecule?
Given how Babel is structured, this might be a separate Snakemake file that is run separately from the main Babel run
Figure out units, checking property ranges, etc.
We could focus on one dataset, work it through all the way to NodeNorm with an interface that works for everybody, and then
(NodeNorm) Load them into the NodeNorm redis
ROBOKOP team has a KGX file for this data on CHEBI; we could test the NodeNorm part of this by loading this into Redis
(NodeNorm) Allow them to be queried via NodeNorm
Might be a separate endpoint (ID -> properties) or a flag on the /get_normalized_nodes endpoint (?include_properties=true)
Will need to figure out how this interacts with conflation, especially with the chemical-chemical conflation that will eventually be implemented
Next steps
[ ] @cbizon and @EvanDietzMorris provide feedback on this proposal.
[ ] We decide whether to implement it in NodeNorm first (Gaurav's preference -- it'll be simpler to test, could be based on a KGX input for now) or in Babel first.
[ ] We implement it on one dataset, make sure we're happy with it, and then add additional datasets over time.
ROBOKOP needs to store properties alongside identifiers; this might be something that is useful for other users to have as well. Since Babel needs to read through the input files anyway, it might make sense for those properties to be extracted in Babel and then exposed through NodeNorm.
Plan
/get_normalized_nodes
endpoint (?include_properties=true)Next steps