biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
10 stars 11 forks source link

implement edge attribute constraints #795

Open andrewsu opened 6 months ago

andrewsu commented 6 months ago

We originally proposed edge attribute constraints in the context of TRAPI 1.3 and https://github.com/biothings/biothings_explorer/issues/482, but breaking this out to its own ticket.

We have a solid use case for edge constraints proposed in this query template for the CQS: https://github.com/TranslatorSRI/CQS/blob/main/templates/mvp1-templates/mvp1-template4-bte-aeolus/mvp1-template5-service-provider-aeolus.json

The key bit is here, attempting to apply a minimum threshold on the biolink:evidence_count from AEOLUS.

    "message": {
        "query_graph": {
            "edges": {
                "e0": {
                    "predicates": [
                        "biolink:applied_to_treat"
                    ],
                    "subject": "n0",
                    "object": "n1",
                    "attribute_constraints": [
                        {
                         "id": "biolink:evidence_count",   
                         "operator": ">",
                         "value": 20            
                        }
                    ]
                }
            },
...

There are (at least) two issues that need to be done/checked:

{
  "edges": {
    "1feea171db6394cfd9bcb20deae0ad9a": {
      "predicate": "biolink:applied_to_treat",
      "subject": "PUBCHEM.COMPOUND:3386",
      "object": "MONDO:0002050",
      "attributes": [
        {
          "attribute_type_id": "biolink:evidence_count",
          "value": [
            733,
            42
          ]
        }
      ],
      "sources": [
        {
          "resource_id": "infores:aeolus",
          "resource_role": "primary_knowledge_source"
        },
        {
          "resource_id": "infores:mychem-info",
          "resource_role": "aggregator_knowledge_source",
          "upstream_resource_ids": [
            "infores:aeolus"
          ]
        },
        {
          "resource_id": "infores:service-provider-trapi",
          "resource_role": "aggregator_knowledge_source",
          "upstream_resource_ids": [
            "infores:mychem-info"
          ]
        }
      ]
    }
  }
}
colleenXu commented 6 months ago

From my perspective, there's 3 issues at play here:

1: edge-attribute value type

BTE is currently returning this edge-attribute in Dev/CI instances (ref: commit).

However, the value type is currently an array of ints (click for examples).

These are from this example BTE response [show-edge-attribute-issue.json](https://github.com/biothings/biothings_explorer/files/14644742/show-edge-attribute-issue.json), which runs the POST version of [this query](https://mychem.info/v1/query?q=aeolus.unii:01K63SUP8D&jmespath=aeolus.indications%7C[?count%3E`20`]&fields=aeolus.indications,aeolus.unii) to MyChem Example of a 1-element array: ``` "dd9daae5b03bcad0698ff6669090f36b": { "predicate": "biolink:applied_to_treat", "subject": "PUBCHEM.COMPOUND:3386", "object": "MEDDRA:10070592", "attributes": [ { "attribute_type_id": "biolink:evidence_count", "value": [ 875 ] } ], ``` Example of a multi-element array: the [733 count from Depression and 42 count from "Depressed mood"](https://mychem.info/v1/query?q=aeolus.unii:01K63SUP8D&jmespath=aeolus.indications%7C[?count%3E`20`]&fields=aeolus.indications,aeolus.unii) were put in the same edge/edge-attribute since both meddra IDs mapped to the "MONDO:0002050 (depressive disorder)" entity. ``` "1feea171db6394cfd9bcb20deae0ad9a": { "predicate": "biolink:applied_to_treat", "subject": "PUBCHEM.COMPOUND:3386", "object": "MONDO:0002050", "attributes": [ { "attribute_type_id": "biolink:evidence_count", "value": [ 733, 42 ] } ], ```

I suggest flattening these into ints, because the array will probably cause validation issues (biolink-model says attribute values should be int) and it'll make the edge-attribute constraint easier to implement.

But we'll need to decide what to do with the multi-element arrays. These are happening because MyChem has separate meddra indication IDs, but BTE/NodeNorm maps them to the same entity. BTE then merges those records into the same edge, and concatenates the counts in the edge-attribute value. I think we could either:

(Note: I'm not sure about flattening all 1-element arrays in edge-attributes. biolink:publications may be one example where we always want it to be an array, but we'd need to check with TRAPI folks first...)

2: what to do with the previous effort - a default, hard-coded count limit

EDIT: SEE UPDATE BELOW - we've implemented this.

We've been trying to add a hard-coded count limit of 20 to our MyChem queries https://github.com/biothings/biothings_explorer/issues/727#issuecomment-1782434986, similar to what we do with SEMMEDDB.

I was able to add it to the aeolusTreats operation (chem -> disease, commit), which all instances are using.

Old notes on reverse operation

But this hasn't been done for the reverse operation `aeolusTreats-rev` (disease -> chem), which is what creative-mode uses. In discussions last week ([three](https://suwulab.slack.com/archives/CC218TEKC/p1710450435216889) [Slack](https://suwulab.slack.com/archives/CC218TEKC/p1710451190238089) [links](https://suwulab.slack.com/archives/CC218TEKC/p1710451427180089)), we finally reached consensus on next steps: * by adjusting the x-bte annotation, I can get partway there. See this [commit](https://github.com/NCATS-Tangerine/translator-api-registry/commit/c09eeebf9c2f044e25ea16b2e228dffbb742e880) (special-reverses branch) * next is writing/implementing the BTE JQ-post-processing to remove the hits when the `aeolus.indications` field is empty. While this should be quick, I'm unsure of the logic to use and would need to discuss with Jackson... * something super-specific, that only works on responses from this operation? * generic-ish: "remove hits if this is a BioThings API AND supportBatch is false AND the scopes field specified in the request body (`aeolus.indications.meddra` in this case) isn't in the hit". * the "BioThings API only" and "supportBatch is false" should match "special reverses" - which are the only cases where we'd need this logic * I don't have other current x-bte annotation examples where this would be useful. I suspect that it may be useful for writing reverse x-bte operations for [MyChem chembl drug-mechanism](https://github.com/biothings/biothings_explorer/issues/779) and [drugcentral bioactivity](https://github.com/biothings/biothings_explorer/issues/780)

But if we want to implement TRAPI-query edge-attribute constraints, it's not clear if we want to go forward with this. An edge-attribute constraint < 20 would conflict with this hard-coded limit.

3: how to implement this issue's ask: TRAPI-query edge-attribute constraints

This is still up for discussion:

Idea: if an edge-attribute constraint is specified...

I had another idea of transforming the constraint into part of the BioThings API query using the x-bte annotation templating and info in the response-mapping, but this would be complex and more effort to think through and implement.

colleenXu commented 6 months ago

During today's group meeting, we made decisions on issues (1) and (3) above:

3: how to implement TRAPI-query edge-attribute constraints

1: how to flatten multi-element arrays

Jackson @tokebe estimated that this would take ~2 days of work. But as part of this effort, they'll review the TRAPI yaml/spec docs for QEdge.attribute_constraints expectations and requirements - and they'll decide how full/robust of an implementation to do for this issue.

colleenXu commented 6 months ago

Note that I've made a PR to ask about this template https://github.com/TranslatorSRI/CQS/pull/9:

colleenXu commented 6 months ago

Update! The template PR https://github.com/TranslatorSRI/CQS/pull/9 has been merged. So the current template is https://github.com/TranslatorSRI/CQS/blob/main/templates/mvp1-templates/mvp1-template4-bte-aeolus/mvp1-template5-service-provider-aeolus.json

Changes:

colleenXu commented 5 months ago

Update on issue 2 above

The hard-coded/default/MyChem-query-level limit is now live in Dev/CI! See the details in https://github.com/biothings/biothings_explorer/issues/727#issuecomment-2046058611

colleenXu commented 3 months ago

And noting that Issue 1 was also addressed last month as part of https://github.com/biothings/biothings_explorer/issues/727#issuecomment-2091478525

Leaving just Issue 3 - the edge attribute constraint implementation itself (first post, later decision)

andrewsu commented 1 day ago

Going to add this to our agenda for next week to discuss. We tabled this earlier this year, and I'd like to revisit where we stand w.r.t. current priorities.

For updated info/context, BioLink Model undertook the "treats refactor", which separates out different types of evidence into different predicates (e.g. "in_clinical_trials_for", "beneficial_in_models_for"). CQS then created one hop templates that capture how to query translator with each of these predicates, and these queries often include edge constraints. (Though we could review those templates in more detail to look at what those edge constraints actually do and how common they are.)

Right now, our first MVP1 template queries the root of the treats predicate hierarchy (biolink:treats_or_applied_or_studied_to_treat). We should break each treats predicate out to different templates (because they have very different levels of confidence; see new issue #881), and then add in the CQS edge constraints once that functionality is available.