phase 1: processing TRAPI-1.4 KP sub-query responses (aux-graph/result.analyses refactor)

colleenXu commented 1 year ago

Background

info on TRAPI 1.4 KP aux-graph / result.analyses expectations in #603
provenance refactoring when ingesting TRAPI KP edges is covered in #617
For now, we can continue our practice of ignoring the KP's scoring....because a KP result's analyses.support_graphs should be related to its scoring, we can ignore it
we can ignore result.analyses.resource_id

Overview

I use this sub-query for my example results below

``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["MONDO:0005377"], "categories":["biolink:DiseaseOrPhenotypicFeature"], "name": "noonan" }, "n1": { "categories":["biolink:Gene"] } }, "edges": { "eA": { "subject": "n0", "object": "n1", "predicates": ["biolink:caused_by"] } } } } } ```

Our sub-queries to TRAPI KPs are 1-hop Predict "style" TRAPI queries with batches of IDs sent in each request.

We expect two kinds of results in the TRAPI-1.4 responses (mirrors the two scenarios of #603)...

no ID/node-expansion was involved

we expect 1 analysis object in result.analyses, so BTE can take that object's edge_bindings and they should be just like the result.edge_bindings in TRAPI 1.3
the edge(s?) there should be "flat", meaning they won't reference an auxiliary graph (aka they won't have an element in the attributes array where the attribute_type_id is "biolink:support_graphs")

example result

This is a "fake" expected result that isn't based on a real response from a TRAPI-1.4 KP... ``` { "node_bindings": { "n0": [ { "id": "MONDO:0005377" } ], "n1": [ { "id": "NCBIGene:3315" } ] }, "analyses": [ { "resource_id": "infores:automat-biolink", "edge_bindings": { "eA": [ { "id": "54d9ed32bec4d12369592709e20c997f" } ] } "score": 0.8 } ] } ```

ID/node-expansion was involved

READ THIS FIRST:

This is still being discussed by the TRAPI team / Translator
the info below is based on TRAPI 1.4.0-beta3 and the discussions Jackson and I had on this topic

Notes:

we still expect 1 analysis object in result.analyses. But the edge(s)? in that object's edge_bindings may reference an auxiliary graph. When this happens, we expect 1 element in the attributes array where the attribute_type_id is "biolink:support_graphs". The value of that Attribute object should be 1 or more keys for auxiliary-graphs...
we could decide to drop these results! (if we want data for descendant IDs, we'll include them in the batch of IDs we send and ideally get edges back with no auxiliary graph references)
UPDATED 2023-04-26 discussion: if we want to keep the edges, but ignore/remove the edge-attribute with the aux-graph (since we'll "drop" that), we may want to generate a warning-level log so we know this is happening.
If we want to process these edges with auxiliary graph references, we'll want to:
- get the referenced auxiliary-graph objects. We want those in the auxiliary_graphs section of our TRAPI response
  - implementation musing: may need to rename aux-graph key to keep it unique?
- get the edges listed in those auxiliary-graph objects. We want those in the knowledge_graph.edges section of our TRAPI response
- check if this set of edges reference auxiliary-graphs (will be in their attributes, same as before). If they do...repeat the two steps above (implementation musing: recursive behavior?)
when doing the next parts of query-execution, I imagine we'd use the main edge(s) from the result.analyses.edge_bindings. So we'd basically ignore the nested auxiliary-graphs and their edges...

Examples:

slides from #603 (ignore the second-hop aka QEdge eB)
slides from this post

colleenXu commented 1 year ago

Specifically interested in @tokebe's view of this idea of dropping results when the analyses.edge_binding edges reference aux-graphs...

we could decide to drop these results! (if we want data for descendant IDs, we'll include them in the batch of IDs we send and ideally get edges back with no auxiliary graph references)

colleenXu commented 1 year ago

Note that COHD's dev instance seems to be on TRAPI 1.4 (we can access it through the registration we currently use, but they also registered a separate yaml for TRAPI 1.4)

However, I haven't checked their /query responses to see if they are using the aux-graph/result.analyses as we expect, and whether we can use it to develop and test our code for this issue...

From my post here: https://github.com/biothings/biothings_explorer/issues/597#issuecomment-1502686927

colleenXu commented 1 year ago

Deleted the previous comment (oops? should have edited or hidden it instead?).

@tokebe and I agreed to adjust the API_LIST config file rather than use SmartAPI overrides, because the names of the APIs were different between registrations.

The adjustments are in this branch https://github.com/biothings/biothings_explorer/compare/main...trapi1-4-overrides and include...

COHD has a second registration for TRAPI 1.4 instances (2023-05-19: created a new one with dev + CI)
Automat KPs have second registrations for TRAPI 1.4 instances (only dev right now). However, some tools that we previously used are missing or don't have TRAPI 1.4 instances. Those have been removed in the API_LIST config file for the main branch (TRAPI 1.3 instances) and this branch
2023-05-19: Connections Hypothesis Provider has a second registration for TRAPI 1.4 instances (only dev right now)

tokebe commented 1 year ago

Above linked PR currently drops all KP result edges that have support graphs, per 1-on-1 discussion with @colleenXu. This might change if there's a good case in which we'd want to keep support graphs.

Note that support graphs on the result analysis are also not kept, but not used as criteria for dropping a result edge (these support graphs would typically explain result scoring, which we also don't use from TRAPI KPs).

colleenXu commented 1 year ago

Note that I'm not sure that CHP's TRAPI 1.4 instance is working (dev only; http://chp.thayer.dartmouth.edu/query). When querying it directly, I'm getting either an empty response or a malformed error response. Looks like BTE is handling this somewhat...but I dunno if it could handle it more gracefully / intelligently?

BTE log example:

        {
            "timestamp": "2023-05-30T20:10:35.710Z",
            "level": "ERROR",
            "message": "call-apis: Failed POST http://chp.thayer.dartmouth.edu (1 ID): Gene > expressed_in > GrossAnatomicalStructure: (TypeError: Cannot read properties of undefined (reading 'id'))",
            "code": null
        },

query 1

``` { "message": { "query_graph": { "nodes": { "n0": { "ids": ["NCBIGene:672"], "categories": ["biolink:Gene"] }, "n1": { "categories": [ "biolink:GrossAnatomicalStructure" ] } }, "edges": { "e0": { "subject": "n0", "object": "n1", "predicates": ["biolink:expressed_in"] } } } } } ```

response to query 1: empty KG/results

``` { "message": { "query_graph": { "nodes": { "n0": { "ids": [ "NCBIGene:672" ], "categories": [ "biolink:Gene" ], "is_set": false, "constraints": [] }, "n1": { "ids": null, "categories": [ "biolink:GrossAnatomicalStructure" ], "is_set": false, "constraints": [] } }, "edges": { "e0": { "subject": "n0", "object": "n1", "knowledge_type": null, "predicates": [ "biolink:expressed_in" ], "attribute_constraints": [], "qualifier_constraints": [] } } }, "knowledge_graph": null, "results": [ { "node_bindings": { "n0": [], "n1": [] }, "analyses": [ { "resource_id": "infores:connections-hypothesis", "edge_bindings": { "e0": [] }, "score": null, "support_graphs": null, "scoring_method": null, "attributes": null } ] } ], "auxiliary_graphs": null }, "logs": [ { "timestamp": "2023-05-30T20:14:12.898845", "level": "INFO", "message": "Running message.", "code": null }, { "timestamp": "2023-05-30T20:14:12.898853", "level": "INFO", "message": "Getting message templates.", "code": null }, { "timestamp": "2023-05-30T20:14:12.898917", "level": "INFO", "message": "Checking template matches for gene_specificity", "code": null }, { "timestamp": "2023-05-30T20:14:12.900685", "level": "INFO", "message": "Detected 1 matches for gene_specificity", "code": null }, { "timestamp": "2023-05-30T20:14:12.900690", "level": "INFO", "message": "Constructing queries on matching templates", "code": null }, { "timestamp": "2023-05-30T20:14:12.900972", "level": "INFO", "message": "Sending 1 consistent queries", "code": null }, { "timestamp": "2023-05-30T20:14:12.905156", "level": "INFO", "message": "Wildcard detected", "code": null }, { "timestamp": "2023-05-30T20:14:12.905312", "level": "INFO", "message": "Received responses from gene_specificity", "code": null } ], "trapi_version": "1.4", "biolink_version": "3.1.2", "status": "Success", "id": "2adeb7ba-6b70-429b-97b1-384f8e9c80f1", "workflow": [ { "id": "lookup" } ] } ```

query 2

uses an ID they list in the example response of the /curies endpoint ``` { "message": { "query_graph": { "nodes": { "n0": { "ids": ["ENSEMBL:ENSG00000106665"], "categories": ["biolink:Gene"] }, "n1": { "categories": [ "biolink:GrossAnatomicalStructure" ] } }, "edges": { "e0": { "subject": "n0", "object": "n1", "predicates": ["biolink:expressed_in"] } } } } } ```

query 2 response: malformed error

very long, only including snippets that seem useful ``` Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/django/core/handlers/exception.py", line 55, in inner response = get_response(request) File "/usr/local/lib/python3.8/site-packages/django/core/handlers/base.py", line 197, in _get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/usr/local/lib/python3.8/site-packages/django/views/decorators/csrf.py", line 56, in wrapper_view return view_func(*args, **kwargs) File "/usr/local/lib/python3.8/site-packages/django/views/generic/base.py", line 104, in view return self.dispatch(request, *args, **kwargs) File "/usr/local/lib/python3.8/site-packages/rest_framework/views.py", line 509, in dispatch response = self.handle_exception(exc) File "/usr/local/lib/python3.8/site-packages/rest_framework/views.py", line 469, in handle_exception self.raise_uncaught_exception(exc) File "/usr/local/lib/python3.8/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception raise exc File "/usr/local/lib/python3.8/site-packages/rest_framework/views.py", line 506, in dispatch response = handler(request, *args, **kwargs) File "/home/chp_api/web/dispatcher/views.py", line 45, in post return dispatcher.get_response(message) File "/home/chp_api/web/dispatcher/base.py", line 167, in get_response responses = get_app_response_fn(consistent_app_queries, self.logger) File "/usr/local/lib/python3.8/site-packages/gene_specificity/app_interface.py", line 25, in get_response response = interface.get_response(consistent_query, logger) File "/usr/local/lib/python3.8/site-packages/gene_specificity/trapi_interface.py", line 142, in get_response self._add_results(message, subject_mapping, qg_subject_id, [curie], subject_category, predicate, qg_edge_id, object_mapping, qg_object_id, object_curies, object_category, vals) Exception Type: TypeError at /query Exception Value: _add_results() missing 2 required positional arguments: 'object_category' and 'vals' ``` ```

You’re seeing this error because you have DEBUG = True in your Django settings file. Change that to False, and Django will display a standard page generated by the handler for this status code.

```

EDIT: I found a query that works. However, (a) BTE wouldn't send a sub-query like this (where the ID is the object) and (b) BTE may not be able to process the response (only 1 result that contains all 30 "answers", as ifis_set: true was on the Gene QNode...)

query that works

This is the example given for their /query endpoint ``` { "message": { "query_graph": { "nodes": { "n0": { "categories": ["biolink:Gene"] }, "n1": { "ids": ["UBERON:0009835"], "categories": ["biolink:GrossAnatomicalStructure"] } }, "edges": { "e0": { "subject": "n0", "object": "n1", "predicates": ["biolink:expressed_in"] } } } } } ```

query response: response2.txt

tokebe commented 1 year ago

Marking this one as done -- we'll treat the above as a new issue (tracked in #685)

colleenXu commented 1 year ago

Note that other tools in Translator aren't doing subclassing w/ aux-graphs right now (it's an after-Sept goal). So...we'll open a new issue if we notice any issues processing their KP responses or we want to change our behavior...

biothings / biothings_explorer