biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
9 stars 10 forks source link

Service Provider/TMKP - queries for SPO that should return zero results appear to return all SP* results #840

Closed bill-baumgartner closed 5 days ago

bill-baumgartner commented 1 month ago

Problem

The TMKP/ServiceProvider TRAPI endpoint returns results when none are expected.

Specifically, when submitting a fully-specified query, i.e., a query where the subject, predicate, and object are all specified (SPO) -- for which zero results are expected -- instead of returning an empty result set, the service appears to returns all subject-predicate-wildcard (SP*) triples, i.e. all triples in the KG that have the specified subject and predicate.

Environment(s)

Currently, this issue is present in DEV, CI, & TEST. The PROD environment successfully returns zero results when expected.

How to reproduce

Below is a query for Ribavirin - biolink:treats_or_applied_or_studied_to_treat - alternating hemiplegia of childhood in TEST. It should return zero results from the TMKP endpoint, but note that it returns many edges linking Ribavirin to many other diseases via biolink:treats_or_applied_or_studied_to_treat (but not to alternating hemiplegia of childhood as is expected).

Here is the output for the query below in case it is helpful: ribavirin-example.test.json.gz

curl -X 'POST' \
  'https://bte.test.transltr.io/v1/smartapi/978fe380a147a8641caf72320862697b/query/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "object": "n01",
          "predicates": [ "biolink:treats_or_applied_or_studied_to_treat" ],
          "subject": "n00"
        }
      },
      "nodes": {
        "n00": {
           "ids": [
            "DRUGBANK:DB00811"
          ]
        },
        "n01": {
       "ids": [
       "MONDO:0016241"
          ]
        }
      }
    }
  }
}'
colleenXu commented 1 month ago

I can recreate this locally (main branch code).

I didn't test more (aka test other direct-explain queries, use other KPs and not just text-mining targeted)...but I suspect this problem is in all instances, including Prod. BTE Prod uses Pending-Prod, which has an outdated version of Text-Mining-Targeted that doesn't match the current SmartAPI yaml x-bte annotation. So currently, none of the Prod Text-Mining Targeted queries return edges. (Noted in https://github.com/biothings/biothings_explorer/issues/788#issuecomment-2167093599)

Note that the last issue I remember with Explain queries was https://github.com/biothings/biothings_explorer/issues/796

Based on the console logs, BTE seems to recognize that the disease IDs for alternating hemiplegia of childhood aren't in the records/edges retrieved: Node "n01" kept (0) curies. But it doesn't seem to drop those records correctly.

Click to see console logs

``` bte:call-apis:query Total number of records returned for this qEdge is 324 +0ms bte:call-apis:query Start to use id resolver module to annotate output ids. +0ms bte:call-apis:query id annotation completes +1s bte:call-apis:query qEdge queries complete in 2s +0ms bte:biothings-explorer-trapi:batch_edge_query APIEdges are successfully queried.... +3s bte:biothings-explorer-trapi:batch_edge_query Filtering out "undefined" items (324) records +0ms bte:biothings-explorer-trapi:batch_edge_query Total number of records is (324) +0ms bte:biothings-explorer-trapi:batch_edge_query Start to update nodes... +0ms bte:biothings-explorer-trapi:batch_edge_query Update nodes completed! +12ms bte:biothings-explorer-trapi:QEdge (6) Storing records... +3s bte:biothings-explorer-trapi:QEdge (6) Applying Node Constraints to 324 records. +1ms bte:biothings-explorer-trapi:QEdge (6) No constraints. Skipping... +0ms bte:biothings-explorer-trapi:QEdge (7) Updating nodes based on edge records... +0ms bte:biothings-explorer-trapi:QEdge (7) Updating Entities in "e00" +0ms bte:biothings-explorer-trapi:QEdge (7) Collecting Types: "["Disease","PhenotypicFeature","BehavioralFeature","ClinicalFinding","DiseaseOrPhenotypicFeature"]" +0ms bte:biothings-explorer-trapi:QEdge Collected entity ids in records: ["Disease"] +7ms bte:biothings-explorer-trapi:QNode (8) Node "n01" restored curie. +3s bte:biothings-explorer-trapi:QNode Node "n01" intersecting (5)/(324) curies... +0ms bte:biothings-explorer-trapi:QNode Node "n01" kept (0) curies... +1ms bte:biothings-explorer-trapi:QEdge (7) Updating Entities in "e00" +2ms bte:biothings-explorer-trapi:QEdge (7) Collecting Types: "["SmallMolecule"]" +0ms bte:biothings-explorer-trapi:QEdge Collected entity ids in records: ["SmallMolecule"] +5ms bte:biothings-explorer-trapi:QNode Node "n00" intersecting (1)/(1) curies... +5ms bte:biothings-explorer-trapi:QNode Node "n00" kept (1) curies... +0ms bte:biothings-explorer-trapi:edge-manager 'e00' Reversed[false] (1)--(0) entities / (324) records. +3s bte:biothings-explorer-trapi:edge-manager 'e00' dropped (0) records. +7ms bte:biothings-explorer-trapi:QEdge (6) Storing records... +7ms bte:biothings-explorer-trapi:QEdge (6) Applying Node Constraints to 324 records. +0ms bte:biothings-explorer-trapi:QEdge (6) No constraints. Skipping... +0ms bte:biothings-explorer-trapi:QEdge (7) Updating nodes based on edge records... +0ms bte:biothings-explorer-trapi:QEdge (7) Updating Entities in "e00" +0ms bte:biothings-explorer-trapi:QEdge (7) Collecting Types: "["Disease","PhenotypicFeature","BehavioralFeature","ClinicalFinding","DiseaseOrPhenotypicFeature"]" +0ms bte:biothings-explorer-trapi:QEdge Collected entity ids in records: ["Disease"] +5ms bte:biothings-explorer-trapi:QNode Node "n01" saving (324) curies... +12ms bte:biothings-explorer-trapi:QEdge (7) Updating Entities in "e00" +0ms bte:biothings-explorer-trapi:QEdge (7) Collecting Types: "["SmallMolecule"]" +0ms bte:biothings-explorer-trapi:QEdge Collected entity ids in records: ["SmallMolecule"] +5ms bte:biothings-explorer-trapi:QNode Node "n00" intersecting (1)/(1) curies... +5ms bte:biothings-explorer-trapi:QNode Node "n00" kept (1) curies... +0ms bte:biothings-explorer-trapi:edge-manager Updating all other edges... +10ms bte:biothings-explorer-trapi:edge-manager (10) Edge successfully queried. +0ms bte:biothings-explorer-trapi:edge-manager (11) Collecting records... +0ms bte:biothings-explorer-trapi:edge-manager (11) 'e00' keeps (324) records! +1ms bte:biothings-explorer-trapi:edge-manager ---------- +0ms bte:biothings-explorer-trapi:edge-manager (12) Collected records for: ["e00"]! +0ms bte:biothings-explorer-trapi:edge-manager (12) Collected (324) records! +0ms bte:biothings-explorer-trapi:edge-manager (13) Edge Manager reporting combined records... +0ms bte:biothings-explorer-trapi:Graph Updating BTE Graph now. +0ms bte:biothings-explorer-trapi:edge-manager (13) Edge Manager reporting organized records... +31ms bte:biothings-explorer-trapi:QueryResult Updating query results now! +0ms bte:biothings-explorer-trapi:QueryResult Nodes with "is_set": ["n01"] +0ms bte:biothings-explorer-trapi:QueryResult initialQEdgeID: e00, initialQNodeIDToMatch: n00 +1ms bte:biothings-explorer-trapi:QueryResult result ID: n00-CHEBI:63580_&_n01 has 324 +19ms bte:biothings-explorer-trapi:QueryResult Did not score results for this endpoint. +3ms bte:biothings-explorer-trapi:Graph pruning BTEGraph nodes/edges... +58ms bte:biothings-explorer-trapi:Graph pruned 5 nodes and 4 edges from BTEGraph. +1ms bte:biothings-explorer-trapi:main (14) TRAPI query finished. +3s ```

The output probably looks odd (1 result with all the nodes/edges) because of subclassing/is_set behavior for the alternating hemiplegia of childhood QNode.

tokebe commented 1 month ago

Reopening until fix deployed to Prod.

colleenXu commented 1 month ago

It looks good, based on two quick tests I did (the one above, which returns 0 results. And the one here, which returns 1 result/1 edge as expected).

Other example query I did

``` { "message": { "query_graph": { "nodes": { "creativeQuerySubject": { "ids":["UNII:69G8BD63PP"], "categories":["biolink:ChemicalEntity"] }, "creativeQueryObject": { "ids":["UniProtKB:P68871"], "categories":["biolink:Gene", "biolink:Protein"] } }, "edges": { "eA": { "subject": "creativeQuerySubject", "object": "creativeQueryObject", "predicates": ["biolink:affects"], "qualifier_constraints": [ { "qualifier_set": [ { "qualifier_type_id": "biolink:object_direction_qualifier", "qualifier_value": "decreased" }, { "qualifier_type_id": "biolink:object_aspect_qualifier", "qualifier_value": "activity_or_abundance" } ] } ] } } } } } ```

tokebe commented 5 days ago

Relevant changes deployed to Prod.