Closed colleenXu closed 2 years ago
My understanding is that the records provided to QueryResults.update
are not necessarily all part of complete paths. They should all match up at specific nodes, but some of the records can still be dead ends. The QueryResults.update
code stitches together the complete paths, but I don't know what happens to the KG for the dead ends.
It might help to take the queries with this issue and...
Just adding some documentation to use demotests for testing solutions to this issue prior to getting a module-level test written:
https://dev.api.bte.ncats.io/demotests/inspect
response -> resultsSanityCheck -> KGEdgesMissingFromResults and KGNodesMissingFromResults
is the most relevant information for testing this issue.Next step is likely to update "knowledge graph" after results-assembly, since results-assembly does a good job of pruning "incomplete paths" or nodes/edges that aren't bound to qNodes/qEdges.
From Slack (Marco):
I took a quick look at what happens after query execution and looks like query_graph and results components of the TRAPI response object get updated independently after query execution, so perhaps that's why the sanity check on the dev instance sometimes fails when the graph has entities the results don't. I had a feeling it could be something like that. queryResults seems to prune out results with incomplete paths while the query graph code is much simpler and just adds whatever you give it, it doesn't check for incomplete paths. Perhaps there's a way to update that based on the result of queryResults? This makes sense for multi hop queries why this could be happening but there was another simple query that also showed extra stuff in the query_graph not in the results. But the code from the bteGraph.update (which gets you query_graph) seems to be pretty straight forward. Just putting it out there for discussion
@tokebe noticed that the message.knowledge_graph section of a TRAPI response contains nodes and edges that aren't in the results section.
Jackson has shown this issue with the analyses https://github.com/biothings/BioThings_Explorer_TRAPI/issues/401#issuecomment-1029298778 and in the reports for automated runs https://dev.api.bte.ncats.io/demotests/ (an example is by going to summary -> A.2_RHOBTB2_twohop.json -> response -> resultsSanityCheck -> KGEdgesMissingFromResults and KGNodesMissingFromResults).
Based on a quick look, I think these nodes/edges are parts of incomplete paths - so it's correct that they do not appear in the results. However, I think we as a team agreed to remove nodes/edges of incomplete paths (they can't fill out a sub-graph that matches the QGraph)...and I thought this was fixed around the time that the new edge-management code was released.
The next steps are to verify that my hunch above is correct...