biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
10 stars 11 forks source link

[fix needed] Procedure semantic type handling #206

Closed colleenXu closed 3 years ago

colleenXu commented 3 years ago

[EDIT: this ticket is outdated since API's operations have been updated since then. See below for updated queries]

BTE queries Clinical Risk KP API, which has the Biolink Procedure entities (NCIT ID) as possible inputs and outputs. Note that Biolink doesn't say much about the Procedure entity type; it doesn't even have preferred ID prefixes...

In order to handle the issues below, perhaps we need:


Main Problem: BTE is not returning answers when Procedure is the output node type. ~~A query like doxycycline (ChemicalSubstance) -> Procedure returns no results (one can send this to BTE's Clinical Risk KP API endpoint by POSTing to here locally or here for the prod public instance). However, the sub-queries should return results (1 example subquery here).

I think this is happening because something goes wrong after the sub-query response is received.

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01"
        }
      },
      "nodes": {
        "n00": {
          "categories": "biolink:ChemicalSubstance",
          "ids": "CHEBI:50845"
        },
        "n01": {
          "categories": "biolink:Procedure"
        }
      }
    }
  }
}

Minor problem / Behavior with Procedure NCIT ID as input

BTE is returning answers when one queries a one-hop from a Procedure like “NCIT:C15271” (liver transplantation) -> ChemicalSubstance. However, because there isn't ID resolution, the Procedure node doesn't include the human-readable label in its equivalent identifiers attribute / name attribute.

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01"
        }
      },
      "nodes": {
        "n00": {
          "categories": "biolink:Procedure",
          "ids": "NCIT:C15271"
        },
        "n01": {
          "categories": "biolink:ChemicalSubstance"
        }
      }
    }
  }
}

Another note:

andrewsu commented 3 years ago

@colleenXu for prioritization purposes, can you post (or link to) a query for which Procedures are important?

colleenXu commented 3 years ago

Procedures currently aren't used in queries. However, they could be valid "treatments" besides ChemicalSubstances.

After more looking, I think we still would need an API endpoint that can handle "Procedure" ID resolution, rather than using Clinical risk api as it currently is. This is because there isn't an easy way to batch-query and get the IDs and their labels.

colleenXu commented 3 years ago

This would be moved to SRI node normalization team's plate after we move to SRI-based resolver...

Since their resolver doesn't support Procedure (it's missing from this list).

colleenXu commented 3 years ago

Using the SRI-based ID resolution and latest code for query handling:

Procedure as the output: A query like this "runs" but the results assembly doesn't work...

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01"
        }
      },
      "nodes": {
        "n00": {
          "categories": ["biolink:Disease"],
          "ids": ["MONDO:0021100"]
        },
        "n01": {
          "categories": ["biolink:Procedure"]
        }
      }
    }
  }
}

Procedure as input: I get error when running a query... error related to something with ID resolution + query handling?

Query:

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01"
        }
      },
      "nodes": {
        "n00": {
          "categories": ["biolink:Procedure"],
          "ids": ["NCIT:C15271"]
        },
        "n01": {
          "categories": ["biolink:Disease"]
        }
      }
    }
  }
}

Error logging:

  bte:call-apis:query After transformation, BTE is able to retrieve 100 hits! +126ms
  bte:call-apis:query Succesfully made the following query: {"url":"https://biothings.ncats.io/clinical_risk_kp/query","params":{"fields":"subject,predicate,object","q":"object.id:\"NCIT:C15271\" AND subject.type:Disease AND predicate.type:related_to AND predicate.feature_coefficient:>0.4 AND _exists_:subject.NCIT","size":1000},"method":"get","timeout":50000} +61ms
  bte:api-response-transform:index api name Clinical Risk KP API +187ms
  bte:api-response-transform:index api tags: disease,drug,chemical_substance,phenotypic_feature,procedure,association,annotation,query,translator,biothings,biothings_graph +0ms
  bte:call-apis:query After transformation, BTE is able to retrieve 0 hits! +0ms
  bte:call-apis:query Succesfully made the following query: {"url":"https://biothings.ncats.io/clinical_risk_kp/query","params":{"fields":"subject,predicate,object","q":"object.id:\"NCIT:C15271\" AND subject.type:Disease AND predicate.type:related_to AND predicate.feature_coefficient:>0.4 AND _exists_:subject.SNOMEDCT","size":1000},"method":"get","timeout":50000} +16ms
  bte:api-response-transform:index api name Clinical Risk KP API +16ms
  bte:api-response-transform:index api tags: disease,drug,chemical_substance,phenotypic_feature,procedure,association,annotation,query,translator,biothings,biothings_graph +0ms
  bte:api-response-transform:transformer input: NCIT:C15271 +79ms
  bte:call-apis:query After transformation, BTE is able to retrieve 1 hits! +2ms
  bte:call-apis:query Succesfully made the following query: {"url":"https://biothings.ncats.io/clinical_risk_kp/query","params":{"fields":"subject,predicate,object","q":"object.id:\"NCIT:C15271\" AND subject.type:Disease AND predicate.type:negatively_correlated_with AND predicate.feature_coefficient:<-0.4 AND _exists_:subject.SNOMEDCT","size":1000},"method":"get","timeout":50000} +2ms
  bte:api-response-transform:index api name Clinical Risk KP API +4ms
  bte:api-response-transform:index api tags: disease,drug,chemical_substance,phenotypic_feature,procedure,association,annotation,query,translator,biothings,biothings_graph +0ms
  bte:api-response-transform:transformer input: NCIT:C15271 +10ms
  bte:api-response-transform:transformer input: NCIT:C15271 +1ms
  bte:api-response-transform:transformer input: NCIT:C15271 +0ms
  bte:api-response-transform:transformer input: NCIT:C15271 +0ms
  bte:api-response-transform:transformer input: NCIT:C15271 +0ms
  bte:api-response-transform:transformer input: NCIT:C15271 +0ms
  bte:api-response-transform:transformer input: NCIT:C15271 +0ms
  bte:api-response-transform:transformer input: NCIT:C15271 +0ms
  bte:api-response-transform:transformer input: NCIT:C15271 +0ms
  bte:api-response-transform:transformer input: NCIT:C15271 +0ms
  bte:api-response-transform:transformer input: NCIT:C15271 +0ms
  bte:api-response-transform:transformer input: NCIT:C15271 +0ms
  bte:call-apis:query After transformation, BTE is able to retrieve 12 hits! +10ms
  bte:call-apis:query query completes. +0ms
  bte:call-apis:query Total number of results returned for this query is 168 +0ms
  bte:call-apis:query Start to use id resolver module to annotate output ids. +0ms
  bte:call-apis:query id annotation completes +697ms
  bte:call-apis:query Query completes +0ms
  bte:biothings-explorer-trapi:batch_edge_query BTEEdges are successfully queried.... +1s
  bte:biothings-explorer-trapi:batch_edge_query Filtering out undefined items (168) results +0ms
  bte:biothings-explorer-trapi:batch_edge_query After...(168) results +0ms
  bte:biothings-explorer-trapi:batch_edge_query Filters applied to search: [] +1ms
  bte:biothings-explorer-trapi:batch_edge_query Filtered results from 168 down to 168 results +0ms
  bte:biothings-explorer-trapi:batch_edge_query Total number of response is 168 +0ms
  bte:biothings-explorer-trapi:batch_edge_query Start to update nodes,hi. +0ms
  bte:biothings-explorer-trapi:batch_edge_query update nodes completed +0ms
  bte:biothings-explorer-trapi:main Query for depth 1 completes. +1s
  bte:biothings-explorer-trapi:main Start to notify subscribers now. +0ms
  bte:biothings-explorer-trapi:QueryResult Updating query results now! +15s
  bte:biothings-explorer-trapi:QueryResult query edge 0 has no records +0ms
  bte:biothings-explorer-trapi:Graph Updating BTE Graph now. +15s
  bte:biothings-explorer-trapi:error_handler TypeError: Cannot read property '0' of undefined
  bte:biothings-explorer-trapi:error_handler     at QueryGraphHelper._getInputID (/Users/jay/Desktop/bte-trapi-workspace/packages/@biothings-explorer/query_graph_handler/built/helper.js:30:32)
  bte:biothings-explorer-trapi:error_handler     at /Users/jay/Desktop/bte-trapi-workspace/packages/@biothings-explorer/query_graph_handler/built/graph/graph.js:17:48
  bte:biothings-explorer-trapi:error_handler     at Array.map (<anonymous>)
  bte:biothings-explorer-trapi:error_handler     at Graph.update (/Users/jay/Desktop/bte-trapi-workspace/packages/@biothings-explorer/query_graph_handler/built/graph/graph.js:16:43)
  bte:biothings-explorer-trapi:error_handler     at /Users/jay/Desktop/bte-trapi-workspace/packages/@biothings-explorer/query_graph_handler/built/batch_edge_query.js:189:24
  bte:biothings-explorer-trapi:error_handler     at Array.map (<anonymous>)
  bte:biothings-explorer-trapi:error_handler     at BatchEdgeQueryHandler.notify (/Users/jay/Desktop/bte-trapi-workspace/packages/@biothings-explorer/query_graph_handler/built/batch_edge_query.js:188:26)
  bte:biothings-explorer-trapi:error_handler     at TRAPIQueryHandler.query (/Users/jay/Desktop/bte-trapi-workspace/packages/@biothings-explorer/query_graph_handler/built/index.js:121:25)
  bte:biothings-explorer-trapi:error_handler     at processTicksAndRejections (node:internal/process/task_queues:96:5)
  bte:biothings-explorer-trapi:error_handler     at async /Users/jay/Desktop/bte-trapi-workspace/packages/@biothings-explorer/bte-trapi/src/routes/v1/query_v1_by_api.js:26:17 +0ms
colleenXu commented 3 years ago

With the refactored SRI-service-based ID resolver now merged...it seems like this issue is resolved. Even though the SRI-service-based ID resolver doesn't actually resolve Procedure semantic type...

The following queries do work as-expected now.

Query with Procedure as input: POST to https://api.bte.ncats.io/v1/smartapi/d86a24f6027ffe778f84ba10a7a1861a/query

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01"
        }
      },
      "nodes": {
        "n00": {
          "categories": ["biolink:Procedure"],
          "ids": ["NCIT:C15271"]
        },
        "n01": {
          "categories": ["biolink:Disease"]
        }
      }
    }
  }
}

Query with Procedure as output: POST to https://api.bte.ncats.io/v1/smartapi/d86a24f6027ffe778f84ba10a7a1861a/query

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01"
        }
      },
      "nodes": {
        "n00": {
          "categories": ["biolink:Disease"],
          "ids": ["MONDO:0021100"]
        },
        "n01": {
          "categories": ["biolink:Procedure"]
        }
      }
    }
  }
}