Closed kevinxin90 closed 3 years ago
Three great examples of "explain" queries from Sui
This is not working as-expected.
For example, we would expect KCNMA1 -(e0)-> biolink:NamedThing <-(e1)- TAAR1 to do the following:
Instead, something seems to be going on that makes the query (TRAPI, listed in the next comment) take a long time (like >30 minutes). Andrew tried separately running the one-hops (1 and 2) above (also listed in the next comment), and both were quick (<6 seconds each).
I think these logs from my console are relevant. I used a JSON viewer to help me read the path parts. This is my interpretation of the logs:
LOGS:
biothings-explorer-trapi:query_graph ALL PATHS {"0":[{"qEdge":{"id":"e0","subject":{"id":"n0","category":["biolink:Gene"],"curie":["HGNC:6284"]},"object":{"id":"n1","category":["biolink:NamedThing"]}},"reverse":false,"input_equivalent_identifiers":{},"output_equivalent_identifiers":{}},{"qEdge":{"id":"e1","subject":{"id":"n2","category":["biolink:Gene"],"curie":["HGNC:17734"]},"object":{"id":"n1","category":["biolink:NamedThing"]}},"reverse":false,"input_equivalent_identifiers":{},"output_equivalent_identifiers":{}}],"1":[{"qEdge":{"id":"e0","subject":{"id":"n0","category":["biolink:Gene"],"curie":["HGNC:6284"]},"object":{"id":"n1","category":["biolink:NamedThing"]}},"reverse":true,"prev_edge":{"qEdge":{"id":"e1","subject":{"id":"n2","category":["biolink:Gene"],"curie":["HGNC:17734"]},"object":{"id":"n1","category":["biolink:NamedThing"]}},"reverse":false,"input_equivalent_identifiers":{},"output_equivalent_identifiers":{}},"input_equivalent_identifiers":{},"output_equivalent_identifiers":{}},{"qEdge":{"id":"e1","subject":{"id":"n2","category":["biolink:Gene"],"curie":["HGNC:17734"]},"object":{"id":"n1","category":["biolink:NamedThing"]}},"reverse":true,"prev_edge":{"qEdge":{"id":"e0","subject":{"id":"n0","category":["biolink:Gene"],"curie":["HGNC:6284"]},"object":{"id":"n1","category":["biolink:NamedThing"]}},"reverse":false,"input_equivalent_identifiers":{},"output_equivalent_identifiers":{}},"input_equivalent_identifiers":{},"output_equivalent_identifiers":{}}]} +0ms
biothings-explorer-trapi:main query paths constructed: {"0":[{"qEdge":{"id":"e0","subject":{"id":"n0","category":["biolink:Gene"],"curie":["HGNC:6284"]},"object":{"id":"n1","category":["biolink:NamedThing"]}},"reverse":false,"input_equivalent_identifiers":{},"output_equivalent_identifiers":{}},{"qEdge":{"id":"e1","subject":{"id":"n2","category":["biolink:Gene"],"curie":["HGNC:17734"]},"object":{"id":"n1","category":["biolink:NamedThing"]}},"reverse":false,"input_equivalent_identifiers":{},"output_equivalent_identifiers":{}}],"1":[{"qEdge":{"id":"e0","subject":{"id":"n0","category":["biolink:Gene"],"curie":["HGNC:6284"]},"object":{"id":"n1","category":["biolink:NamedThing"]}},"reverse":true,"prev_edge":{"qEdge":{"id":"e1","subject":{"id":"n2","category":["biolink:Gene"],"curie":["HGNC:17734"]},"object":{"id":"n1","category":["biolink:NamedThing"]}},"reverse":false,"input_equivalent_identifiers":{},"output_equivalent_identifiers":{}},"input_equivalent_identifiers":{},"output_equivalent_identifiers":{}},{"qEdge":{"id":"e1","subject":{"id":"n2","category":["biolink:Gene"],"curie":["HGNC:17734"]},"object":{"id":"n1","category":["biolink:NamedThing"]}},"reverse":true,"prev_edge":{"qEdge":{"id":"e0","subject":{"id":"n0","category":["biolink:Gene"],"curie":["HGNC:6284"]},"object":{"id":"n1","category":["biolink:NamedThing"]}},"reverse":false,"input_equivalent_identifiers":{},"output_equivalent_identifiers":{}},"input_equivalent_identifiers":{},"output_equivalent_identifiers":{}}]} +1ms
biothings-explorer-trapi:main Query depth is 2 +1ms
TRAPI query for KCNMA1 -(e0)-> biolink:NamedThing <-(e1)- TAAR1
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids": ["HGNC:6284"],
"categories": ["biolink:Gene"]
},
"n1": {
"categories":["biolink:NamedThing"]
},
"n2": {
"ids":["HGNC:17734"],
"categories":["biolink:Gene"]
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1"
},
"e1": {
"subject": "n2",
"object": "n1"
}
}
}
}
}
Fast One Hop 1: KCNMA1-(e0)-> biolink:NamedThing
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids": ["HGNC:6284"],
"categories": ["biolink:Gene"]
},
"n1": {
"categories":["biolink:NamedThing"]
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1"
}
}
}
}
}
Fast One Hop 2: TAAR1-(e1)-> biolink:NamedThing
{
"message": {
"query_graph": {
"nodes": {
"n1": {
"categories":["biolink:NamedThing"]
},
"n2": {
"ids":["HGNC:17734"],
"categories":["biolink:Gene"]
}
},
"edges": {
"e1": {
"subject": "n2",
"object": "n1"
}
}
}
}
}
This is a special kind of Explain-query we also want to support (see TRAPI query below): ChemicalSubstance celecoxib (PUBCHEM.COMPOUND:2662) -> PTGS1 (HGNC:9604). It's from a Translator standup meeting.
The minimal expected behavior is:
Currently, BTE is only doing 1 and ID-resolving the gene ID in the query.
Expected edges in the answer: For the example query, I would expect only the following edges to exist in the Response:
The TRAPI query:
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:2662"],
"categories":["biolink:ChemicalSubstance"]
},
"n1": {
"categories":["biolink:Gene"],
"ids":["HGNC:9604"]
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1"
}
}
}
}
}
This functionality is high priority since it's come up in standup queries and the demo (Workflow D, maybe Workflow C).
Note that @ericz1803 found this repo https://github.com/kevinxin90/explain.js from Kevin that handles the special case of explain queries with one intermediate node (used at https://biothings.io/explorer/explain). It is based on @biothings-explorer/call-apis
and @biothings-explorer/smartapi-kg
, so may be useful to consult when implementing explain queries in the main application. In fact, it could be that the short-term solution to this ticket would be to integrate this code into the main app, leaving the longer-term generalized query handler to handle longer paths and more complex query topologies.
one-hop explain query:
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:2662"],
"categories":["biolink:ChemicalSubstance"]
},
"n1": {
"categories":["biolink:Gene"],
"ids":["HGNC:9604"]
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1"
}
}
}
}
}
two-hop explain query (version 1):
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:2662"],
"categories":["biolink:ChemicalSubstance"]
},
"n1": {
"categories":["biolink:Disease"]
},
"n2": {
"categories":["biolink:Gene"],
"ids":["HGNC:9604"]
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1"
},
"e1": {
"subject": "n1",
"object": "n2"
}
}
}
}
}
two-hop explain query (version 2):
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:2662"],
"categories":["biolink:ChemicalSubstance"]
},
"n1": {
"categories":["biolink:Disease"]
},
"n2": {
"categories":["biolink:Gene"],
"ids":["HGNC:9604"]
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1"
},
"e1": {
"subject": "n2",
"object": "n1"
}
}
}
}
}
https://github.com/biothings/BioThings_Explorer_TRAPI/issues/112#issuecomment-865448828 csgene.txt These are the results I'm getting using the new query handler algorithm, just wanna make sure it's looking OK. Going through some of the queries here as I read it.
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:2662"],
"categories":["biolink:ChemicalSubstance"]
},
"n1": {
"categories":["biolink:Disease"]
},
"n2": {
"categories":["biolink:Gene"],
"ids":["HGNC:9604"]
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1"
},
"e1": {
"subject": "n2",
"object": "n1"
}
}
}
}
}
This is the new result for this two hop query above, new logs will explain the process hopefully. twohop.txt
For Workflow D:
Note:
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"categories": ["biolink:Disease"],
"ids": ["MESH:D015464"]
},
"n1": {
"categories": ["biolink:Gene"]
}
},
"edges": {
"e01": {
"subject": "n0",
"object": "n1"
}
}
}
}
}
Note: This test query should have this path as a result: ChemicalSubstance PUBCHEM.COMPOUND:2662 <-> Disease MONDO:0002974 <-> Pathway REACT:R-HSA-109704 <-> HGNC:17947.
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:2662"],
"categories":["biolink:SmallMolecule"]
},
"n1": {
"categories":["biolink:Disease"]
},
"n2": {
"categories":["biolink:Pathway"]
},
"n3": {
"categories":["biolink:Gene"],
"ids":["HGNC:17947"]
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1"
},
"e1": {
"subject": "n1",
"object": "n2"
},
"e2": {
"subject": "n2",
"object": "n3"
}
}
}
}
}
This path should exist: Pathway REACT:R-HSA-1368082 <-> Gene NCBIGene:1374 <-> ChemicalSubstance CHEBI:35553 - Disease MONDO:0009287
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"categories": ["biolink:Pathway"],
"ids": ["REACT:R-HSA-1368082"]
},
"n1": {
"categories": ["biolink:Gene"]
},
"n2": {
"categories": ["biolink:ChemicalSubstance"]
},
"n3": {
"categories": ["biolink:Disease"],
"ids": ["MONDO:0009287"]
}
},
"edges": {
"e01": {
"subject": "n0",
"object": "n1"
},
"e02": {
"subject": "n1",
"object": "n2"
},
"e03": {
"subject": "n3",
"object": "n2"
}
}
}
}
}
What the results object should look like:
{
"node_bindings": {
"n0": ["id": "CHEBI:41423"],
"n1": ["id": "MONDO:0004247"],
"n2": ["id": "NCBIGene:5742"]
},
"edge_bindings": {
"e0": ["id": "CHEBI:41423-biolink:related_to-MONDO:0004247"],
"e1": ["id": "NCBIGene:5742-biolink:related_to-MONDO:0004247"]
}
}
For this query:
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:2662"],
"categories":["biolink:ChemicalSubstance"]
},
"n1": {
"categories":["biolink:Disease"]
},
"n2": {
"categories":["biolink:Gene"],
"ids":["HGNC:9604"]
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1"
},
"e1": {
"subject": "n2",
"object": "n1"
}
}
}
}
}
Note: Kevin's opening query, reformatted now has results that look as expected. The query:
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids": ["MESH:D015464"],
"categories": ["biolink:Disease"]
},
"n1": {
"categories": ["biolink:Gene"]
},
"n2": {
"ids": ["CHEBI:45783"],
"categories": ["biolink:SmallMolecule"]
}
},
"edges": {
"e01": {
"subject": "n0",
"object": "n1"
},
"e02": {
"subject": "n1",
"object": "n2"
}
}
}
}
}
The new query-handler handles these cases, this was checked during my testing process for the code.