biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
10 stars 11 forks source link

phase 1: processing TRAPI-1.4 KP sub-query responses (aux-graph/result.analyses refactor) #614

Closed colleenXu closed 1 year ago

colleenXu commented 1 year ago

Background

Overview

I use this sub-query for my example results below ``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["MONDO:0005377"], "categories":["biolink:DiseaseOrPhenotypicFeature"], "name": "noonan" }, "n1": { "categories":["biolink:Gene"] } }, "edges": { "eA": { "subject": "n0", "object": "n1", "predicates": ["biolink:caused_by"] } } } } } ```

Our sub-queries to TRAPI KPs are 1-hop Predict "style" TRAPI queries with batches of IDs sent in each request.

We expect two kinds of results in the TRAPI-1.4 responses (mirrors the two scenarios of #603)...

no ID/node-expansion was involved

example result This is a "fake" expected result that isn't based on a real response from a TRAPI-1.4 KP... ``` { "node_bindings": { "n0": [ { "id": "MONDO:0005377" } ], "n1": [ { "id": "NCBIGene:3315" } ] }, "analyses": [ { "resource_id": "infores:automat-biolink", "edge_bindings": { "eA": [ { "id": "54d9ed32bec4d12369592709e20c997f" } ] } "score": 0.8 } ] } ```

ID/node-expansion was involved

READ THIS FIRST:

Notes:

Examples:

colleenXu commented 1 year ago

Specifically interested in @tokebe's view of this idea of dropping results when the analyses.edge_binding edges reference aux-graphs...

we could decide to drop these results! (if we want data for descendant IDs, we'll include them in the batch of IDs we send and ideally get edges back with no auxiliary graph references)

colleenXu commented 1 year ago

Note that COHD's dev instance seems to be on TRAPI 1.4 (we can access it through the registration we currently use, but they also registered a separate yaml for TRAPI 1.4)

However, I haven't checked their /query responses to see if they are using the aux-graph/result.analyses as we expect, and whether we can use it to develop and test our code for this issue...

From my post here: https://github.com/biothings/biothings_explorer/issues/597#issuecomment-1502686927

colleenXu commented 1 year ago

Deleted the previous comment (oops? should have edited or hidden it instead?).

@tokebe and I agreed to adjust the API_LIST config file rather than use SmartAPI overrides, because the names of the APIs were different between registrations.

The adjustments are in this branch https://github.com/biothings/biothings_explorer/compare/main...trapi1-4-overrides and include...

tokebe commented 1 year ago

Above linked PR currently drops all KP result edges that have support graphs, per 1-on-1 discussion with @colleenXu. This might change if there's a good case in which we'd want to keep support graphs.

Note that support graphs on the result analysis are also not kept, but not used as criteria for dropping a result edge (these support graphs would typically explain result scoring, which we also don't use from TRAPI KPs).

colleenXu commented 1 year ago

Note that I'm not sure that CHP's TRAPI 1.4 instance is working (dev only; http://chp.thayer.dartmouth.edu/query). When querying it directly, I'm getting either an empty response or a malformed error response. Looks like BTE is handling this somewhat...but I dunno if it could handle it more gracefully / intelligently?

BTE log example:

        {
            "timestamp": "2023-05-30T20:10:35.710Z",
            "level": "ERROR",
            "message": "call-apis: Failed POST http://chp.thayer.dartmouth.edu (1 ID): Gene > expressed_in > GrossAnatomicalStructure: (TypeError: Cannot read properties of undefined (reading 'id'))",
            "code": null
        },
query 1 ``` { "message": { "query_graph": { "nodes": { "n0": { "ids": ["NCBIGene:672"], "categories": ["biolink:Gene"] }, "n1": { "categories": [ "biolink:GrossAnatomicalStructure" ] } }, "edges": { "e0": { "subject": "n0", "object": "n1", "predicates": ["biolink:expressed_in"] } } } } } ```
response to query 1: empty KG/results ``` { "message": { "query_graph": { "nodes": { "n0": { "ids": [ "NCBIGene:672" ], "categories": [ "biolink:Gene" ], "is_set": false, "constraints": [] }, "n1": { "ids": null, "categories": [ "biolink:GrossAnatomicalStructure" ], "is_set": false, "constraints": [] } }, "edges": { "e0": { "subject": "n0", "object": "n1", "knowledge_type": null, "predicates": [ "biolink:expressed_in" ], "attribute_constraints": [], "qualifier_constraints": [] } } }, "knowledge_graph": null, "results": [ { "node_bindings": { "n0": [], "n1": [] }, "analyses": [ { "resource_id": "infores:connections-hypothesis", "edge_bindings": { "e0": [] }, "score": null, "support_graphs": null, "scoring_method": null, "attributes": null } ] } ], "auxiliary_graphs": null }, "logs": [ { "timestamp": "2023-05-30T20:14:12.898845", "level": "INFO", "message": "Running message.", "code": null }, { "timestamp": "2023-05-30T20:14:12.898853", "level": "INFO", "message": "Getting message templates.", "code": null }, { "timestamp": "2023-05-30T20:14:12.898917", "level": "INFO", "message": "Checking template matches for gene_specificity", "code": null }, { "timestamp": "2023-05-30T20:14:12.900685", "level": "INFO", "message": "Detected 1 matches for gene_specificity", "code": null }, { "timestamp": "2023-05-30T20:14:12.900690", "level": "INFO", "message": "Constructing queries on matching templates", "code": null }, { "timestamp": "2023-05-30T20:14:12.900972", "level": "INFO", "message": "Sending 1 consistent queries", "code": null }, { "timestamp": "2023-05-30T20:14:12.905156", "level": "INFO", "message": "Wildcard detected", "code": null }, { "timestamp": "2023-05-30T20:14:12.905312", "level": "INFO", "message": "Received responses from gene_specificity", "code": null } ], "trapi_version": "1.4", "biolink_version": "3.1.2", "status": "Success", "id": "2adeb7ba-6b70-429b-97b1-384f8e9c80f1", "workflow": [ { "id": "lookup" } ] } ```
query 2 uses an ID they list in the example response of the /curies endpoint ``` { "message": { "query_graph": { "nodes": { "n0": { "ids": ["ENSEMBL:ENSG00000106665"], "categories": ["biolink:Gene"] }, "n1": { "categories": [ "biolink:GrossAnatomicalStructure" ] } }, "edges": { "e0": { "subject": "n0", "object": "n1", "predicates": ["biolink:expressed_in"] } } } } } ```
query 2 response: malformed error very long, only including snippets that seem useful ``` Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/django/core/handlers/exception.py", line 55, in inner response = get_response(request) File "/usr/local/lib/python3.8/site-packages/django/core/handlers/base.py", line 197, in _get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/usr/local/lib/python3.8/site-packages/django/views/decorators/csrf.py", line 56, in wrapper_view return view_func(*args, **kwargs) File "/usr/local/lib/python3.8/site-packages/django/views/generic/base.py", line 104, in view return self.dispatch(request, *args, **kwargs) File "/usr/local/lib/python3.8/site-packages/rest_framework/views.py", line 509, in dispatch response = self.handle_exception(exc) File "/usr/local/lib/python3.8/site-packages/rest_framework/views.py", line 469, in handle_exception self.raise_uncaught_exception(exc) File "/usr/local/lib/python3.8/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception raise exc File "/usr/local/lib/python3.8/site-packages/rest_framework/views.py", line 506, in dispatch response = handler(request, *args, **kwargs) File "/home/chp_api/web/dispatcher/views.py", line 45, in post return dispatcher.get_response(message) File "/home/chp_api/web/dispatcher/base.py", line 167, in get_response responses = get_app_response_fn(consistent_app_queries, self.logger) File "/usr/local/lib/python3.8/site-packages/gene_specificity/app_interface.py", line 25, in get_response response = interface.get_response(consistent_query, logger) File "/usr/local/lib/python3.8/site-packages/gene_specificity/trapi_interface.py", line 142, in get_response self._add_results(message, subject_mapping, qg_subject_id, [curie], subject_category, predicate, qg_edge_id, object_mapping, qg_object_id, object_curies, object_category, vals) Exception Type: TypeError at /query Exception Value: _add_results() missing 2 required positional arguments: 'object_category' and 'vals' ``` ```

You’re seeing this error because you have DEBUG = True in your Django settings file. Change that to False, and Django will display a standard page generated by the handler for this status code.

```

EDIT: I found a query that works. However, (a) BTE wouldn't send a sub-query like this (where the ID is the object) and (b) BTE may not be able to process the response (only 1 result that contains all 30 "answers", as ifis_set: true was on the Gene QNode...)

query that works This is the example given for their /query endpoint ``` { "message": { "query_graph": { "nodes": { "n0": { "categories": ["biolink:Gene"] }, "n1": { "ids": ["UBERON:0009835"], "categories": ["biolink:GrossAnatomicalStructure"] } }, "edges": { "e0": { "subject": "n0", "object": "n1", "predicates": ["biolink:expressed_in"] } } } } } ```

query response: response2.txt

tokebe commented 1 year ago

Marking this one as done -- we'll treat the above as a new issue (tracked in #685)

colleenXu commented 1 year ago

Note that other tools in Translator aren't doing subclassing w/ aux-graphs right now (it's an after-Sept goal). So...we'll open a new issue if we notice any issues processing their KP responses or we want to change our behavior...