Open andrewsu opened 3 years ago
TDLR:
These two scenarios should return the same knowledge_graph nodes / go through the same sub-queries. They would return different edges/results because the edges for the second hop (between B and C) would have different directions/predicates.
Scenario 2 doesn't work as planned. It looks like this is because of the predicate restriction in the second step.
BTE does the querying correctly:
First: it queries from KCNMA1 -> Disease (set of IDs), then does predicate restriction to only keep nodes/edges/results with the predicate "gene_associated_with_condition". The logs for the predicate restriction are below:
{
"timestamp": "2021-06-24T16:12:18.959Z",
"level": "DEBUG",
"message": "query_graph_handler: Total number of results returned for this query is 344.",
"code": null
},
{
"timestamp": "2021-06-24T16:12:18.959Z",
"level": "DEBUG",
"message": "query_graph_handler: Successfully applied post-query predicate restriction with 42 results.",
"code": null
},
Second: it correctly flips the query edge from Gene <- filtered set of Disease IDs to filtered set of Disease IDs -> Gene. It then does that set of queries.
However, it looks like it then tries to restrict the answer edges (Disease -> Gene) with the predicate "gene_associated_with_condition" which none of them have. Instead some have the flipped/inverse predicate condition_associated_with_gene (like Query 1).
See the logs below:
{
"timestamp": "2021-06-24T16:13:41.941Z",
"level": "DEBUG",
"message": "query_graph_handler: Total number of results returned for this query is 73000.",
"code": null
},
{
"timestamp": "2021-06-24T16:13:41.941Z",
"level": "DEBUG",
"message": "query_graph_handler: Successfully applied post-query predicate restriction with 0 results.",
"code": null
}
So when BTE correctly flips an edge for querying, some fix is needed so the predicate restriction is then done correctly.
I could think of two ways to fix this:
Query 2: the "modified" predict that doesn't work
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids": ["HGNC:6284"],
"categories":["biolink:Gene"]
},
"n1": {
"categories": ["biolink:Disease"]
},
"n2": {
"categories": ["biolink:Gene"]
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1",
"predicates": ["biolink:gene_associated_with_condition"]
},
"e1": {
"subject": "n2",
"object": "n1",
"predicates": ["biolink:gene_associated_with_condition"]
}
}
}
}
}
Query 1: "classic" predict that works
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids": ["HGNC:6284"],
"categories":["biolink:Gene"]
},
"n1": {
"categories": ["biolink:Disease"]
},
"n2": {
"categories": ["biolink:Gene"]
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1",
"predicates": ["biolink:gene_associated_with_condition"]
},
"e1": {
"subject": "n1",
"object": "n2",
"predicates": ["biolink:condition_associated_with_gene"]
}
}
}
}
}
For @andrewsu 's note above:
ChemicalSubstance[id:XXXX] - treats - Disease would return the same results as Disease - treated_by -> ChemicalSubstance[id:XXXX].
1 hops seem to work fine right now. My tests were here. I can add another test for ChemicalSubstance treats Disease and vice versa if you would like.
See my review here
@andrewsu I've made a doc here for 1 and 2 hop Predicts and Explain. I've tested all of the queries with the latest query handling code and I think it works as-expected.
https://github.com/biothings/biothings_explorer/blob/query-directions/docs/tests-query-directions.md
I think the next step is deciding how to turn these into tests that aren't brittle (dependent on specific APIs / configurations...)
rebased the branch to match latest main, moved and updated the file: https://github.com/biothings/BioThings_Explorer_TRAPI/commits/query-directions
next time try merging to main rather than rebasing...
FYI to @tokebe. This is the issue that involves testing QEdge directionality.
Although it also involves testing more than that maybe....
To summarize current status here, Colleen created a "query-directions" branch that has a markdown file with various queries that are slightly different in terms of directionality. Behavior was as-expected when this doc was created ~2 years ago -- the different variants produced the same results. We could create integration tests around these queries to test, for example, that the node, edge, and result counts are the same between the variants.
Going to leave this issue open as a nice-to-have...
We should check whether BTE is correctly normalizing alternate representations of one query. For example I would expect that
ChemicalSubstance[id:XXXX]
-treats
-Disease
would return the same results asDisease
-treated_by
->ChemicalSubstance[id:XXXX]
.We should also test longer query paths, ie
A[id:XXX]
->B
->C
should be the same asA[id:XXXX]
->B
<-C
.Assuming these BTE is using the expected logic to equate these various queries, we should create formal tests for these cases.