Open dosumis opened 2 years ago
CC @matentzn - would be useful to get some comment on strategy here.
Having review the status of OAK dev, we have decided to avoid using it for now. Alternative is to use UberGraph
You should be able to get ancestors (via subClassOf) from the UberGraph redundant graph, and direct parent classes from the non-redundant graph. Queries should be batched using VALUES for speed. Logic for setting upper bounds is the same as above.
However - I don't think this belongs in VFB as it will involve calling an external service. Code should belong to CAP.
STATUS: DRAFT
For the CAP project, we would like to load parents and ancestors to SOLR (storing labels and curies for each node). This needs to be configurable by relation and to allow specification of upper bounds.
obographs-solr.py is the current loader so it would be simplest to just extend this. As this is both a runner script and a collection of functions, args should be shifted to use argparse and new functionality should be driven by optional args. This will ensure that current uses of the script (e.g. in VFB) will remain unaffected.
This script already uses OBOgraphs json format to load labels and synonyms to load content to SOLR. OAK can load these data structures and has an interface that makes it easy to get lists of descendants or ancestors.
Suggested new args:
--add-ancestors {path to file of curies specifying relations to follow - default = subClassOf} --upper-bounds {path to file of curies specifying upper bounds}
For each each term in the upper bound list, generate a list of descendants. (UBD) For each term loaded, generate list of ancestors. Load the intersection of this list with UBD.
Potential concerns: Scaling Possible alternative - just use an ubergraph for queries?