Open cbizon opened 4 years ago
For ROBOKOP/omnicorp I've written the logic in Python to do the edge lookups per result (deduplicated) and all that. But depending on how connected things are, apoc.algo.cover() may be faster. If neither is efficient, adapting cover() to our tricky use case might not be hard.
Good to know - I think that adding all the edges to the KG might be more general, allowing us to use overlay in new querying patterns? Not sure if it's really helpful though.
one other question: however we add the edges, should we also add support edge bindings to the results? Or would you anticipate that is a different function, perhaps something the caller would do?
I could see it either way. It just depends on how we define this "overlay" operation.
I just saw the apoc version and i think is straight forward. And i think we can batch it for a subset of answers in results.
match (n:named_thing) where n.id in ["GO:0002412",
"GO:0002404", "M"] with collect(n) as nodes
call apoc.algo.cover(nodes) yield rel return rel , startNode(rel) , endNode(rel)
Something like this where we can send a list of nodes existing in our subset of answers. And apply edges for matching pairs in each answer . , one thing i am a little confused about is when we return a new reasoner api output do we want these support edges same style as omnicorp, where we don't have actual query graph bindings matching our generated query graph ids for them , but have source_id and target_id in the edges themselves (that are returned as part of knowledge_graph.results) ?
when we return a new reasoner api output do we want these support edges same style as omnicorp,
I think so. I guess it's possible that we might want to change how that works in the future, but this seems like a good starting point.
Hi @cbizon , @patrickkwang , I think i have a working version of this functionality, the way it is working now is it will grab a set of answers (first 1000), it queries apoc cover for the node ids in those answer and keeps a local graph. then for each answer in that batch of answer (first 1000), it will add support edge if 1. a node binding curie is listed as a source in the apoc cover graph, and 2. if any other curies listed in the current node bindings are listed as targets in the cover graph. (KITCHEN/PLATER/services/util/overlay.py). In relation to batching it keeps track of edges to append to knowledge_graph.edges to avoid duplicate entry in that attribute. currently building an image for it and deploying in a few
When we stand up a plater instance, several interfaces are created automatically, including a ReasonerAPI. That API is a query api, meaning that it looks at the query graph, and creates a KG and answers.
We also want an /overlay endpoint that implements ReasonerAPI. In this case, though, the question graph and results are ignored. Edges are added to the KG that connect any nodes in the KG. So if the KG has 8 nodes, this interface will look int the neo4j for any edge connecting any 2 of those 8 and add that edge to the KG.
There is also discussion of this distinction in https://github.com/NCATS-Tangerine/icees-api/issues/27
There is one implementation issue to consider: The KG is a bunch of nodes made by the union of results. So there are nodes in the KG that are never part of the same answer. In those cases, we don't especially care about edges between those two nodes. So we could do the edge lookups per result. But we'd need to be careful about not redoing edge queries when we don't need to, and I'm concerned that this will just make things more complicated, and potentially even slower.
Of note, I think that this will be trivial to implement, since apoc.algo.cover does exactly this query.
@patrickkwang any suggestions? @YaphetKG would you have time to work on this? If we have this, then it will be (relatively?) easy to implement using literature KP results as our new omincorp.