Open andrewsu opened 6 months ago
From my perspective, there's 3 issues at play here:
BTE is currently returning this edge-attribute in Dev/CI instances (ref: commit).
These are from this example BTE response [show-edge-attribute-issue.json](https://github.com/biothings/biothings_explorer/files/14644742/show-edge-attribute-issue.json), which runs the POST version of [this query](https://mychem.info/v1/query?q=aeolus.unii:01K63SUP8D&jmespath=aeolus.indications%7C[?count%3E`20`]&fields=aeolus.indications,aeolus.unii) to MyChem Example of a 1-element array: ``` "dd9daae5b03bcad0698ff6669090f36b": { "predicate": "biolink:applied_to_treat", "subject": "PUBCHEM.COMPOUND:3386", "object": "MEDDRA:10070592", "attributes": [ { "attribute_type_id": "biolink:evidence_count", "value": [ 875 ] } ], ``` Example of a multi-element array: the [733 count from Depression and 42 count from "Depressed mood"](https://mychem.info/v1/query?q=aeolus.unii:01K63SUP8D&jmespath=aeolus.indications%7C[?count%3E`20`]&fields=aeolus.indications,aeolus.unii) were put in the same edge/edge-attribute since both meddra IDs mapped to the "MONDO:0002050 (depressive disorder)" entity. ``` "1feea171db6394cfd9bcb20deae0ad9a": { "predicate": "biolink:applied_to_treat", "subject": "PUBCHEM.COMPOUND:3386", "object": "MONDO:0002050", "attributes": [ { "attribute_type_id": "biolink:evidence_count", "value": [ 733, 42 ] } ], ```
I suggest flattening these into ints, because the array will probably cause validation issues (biolink-model says attribute values should be int) and it'll make the edge-attribute constraint easier to implement.
But we'll need to decide what to do with the multi-element arrays. These are happening because MyChem has separate meddra indication IDs, but BTE/NodeNorm maps them to the same entity. BTE then merges those records into the same edge, and concatenates the counts in the edge-attribute value. I think we could either:
(Note: I'm not sure about flattening all 1-element arrays in edge-attributes. biolink:publications
may be one example where we always want it to be an array, but we'd need to check with TRAPI folks first...)
EDIT: SEE UPDATE BELOW - we've implemented this.
We've been trying to add a hard-coded count limit of 20 to our MyChem queries https://github.com/biothings/biothings_explorer/issues/727#issuecomment-1782434986, similar to what we do with SEMMEDDB.
I was able to add it to the aeolusTreats
operation (chem -> disease, commit), which all instances are using.
But this hasn't been done for the reverse operation `aeolusTreats-rev` (disease -> chem), which is what creative-mode uses. In discussions last week ([three](https://suwulab.slack.com/archives/CC218TEKC/p1710450435216889) [Slack](https://suwulab.slack.com/archives/CC218TEKC/p1710451190238089) [links](https://suwulab.slack.com/archives/CC218TEKC/p1710451427180089)), we finally reached consensus on next steps: * by adjusting the x-bte annotation, I can get partway there. See this [commit](https://github.com/NCATS-Tangerine/translator-api-registry/commit/c09eeebf9c2f044e25ea16b2e228dffbb742e880) (special-reverses branch) * next is writing/implementing the BTE JQ-post-processing to remove the hits when the `aeolus.indications` field is empty. While this should be quick, I'm unsure of the logic to use and would need to discuss with Jackson... * something super-specific, that only works on responses from this operation? * generic-ish: "remove hits if this is a BioThings API AND supportBatch is false AND the scopes field specified in the request body (`aeolus.indications.meddra` in this case) isn't in the hit". * the "BioThings API only" and "supportBatch is false" should match "special reverses" - which are the only cases where we'd need this logic * I don't have other current x-bte annotation examples where this would be useful. I suspect that it may be useful for writing reverse x-bte operations for [MyChem chembl drug-mechanism](https://github.com/biothings/biothings_explorer/issues/779) and [drugcentral bioactivity](https://github.com/biothings/biothings_explorer/issues/780)
But if we want to implement TRAPI-query edge-attribute constraints, it's not clear if we want to go forward with this. An edge-attribute constraint < 20 would conflict with this hard-coded limit.
This is still up for discussion:
Idea: if an edge-attribute constraint is specified...
I had another idea of transforming the constraint into part of the BioThings API query using the x-bte annotation templating and info in the response-mapping, but this would be complex and more effort to think through and implement.
During today's group meeting, we made decisions on issues (1) and (3) above:
3: how to implement TRAPI-query edge-attribute constraints
1: how to flatten multi-element arrays
Jackson @tokebe estimated that this would take ~2 days of work. But as part of this effort, they'll review the TRAPI yaml/spec docs for QEdge.attribute_constraints expectations and requirements - and they'll decide how full/robust of an implementation to do for this issue.
Note that I've made a PR to ask about this template https://github.com/TranslatorSRI/CQS/pull/9:
Update! The template PR https://github.com/TranslatorSRI/CQS/pull/9 has been merged. So the current template is https://github.com/TranslatorSRI/CQS/blob/main/templates/mvp1-templates/mvp1-template4-bte-aeolus/mvp1-template5-service-provider-aeolus.json
Changes:
The hard-coded/default/MyChem-query-level limit is now live in Dev/CI! See the details in https://github.com/biothings/biothings_explorer/issues/727#issuecomment-2046058611
And noting that Issue 1 was also addressed last month as part of https://github.com/biothings/biothings_explorer/issues/727#issuecomment-2091478525
Leaving just Issue 3 - the edge attribute constraint implementation itself (first post, later decision)
Going to add this to our agenda for next week to discuss. We tabled this earlier this year, and I'd like to revisit where we stand w.r.t. current priorities.
For updated info/context, BioLink Model undertook the "treats refactor", which separates out different types of evidence into different predicates (e.g. "in_clinical_trials_for", "beneficial_in_models_for"). CQS then created one hop templates that capture how to query translator with each of these predicates, and these queries often include edge constraints. (Though we could review those templates in more detail to look at what those edge constraints actually do and how common they are.)
Right now, our first MVP1 template queries the root of the treats predicate hierarchy (biolink:treats_or_applied_or_studied_to_treat
). We should break each treats predicate out to different templates (because they have very different levels of confidence; see new issue #881), and then add in the CQS edge constraints once that functionality is available.
We originally proposed edge attribute constraints in the context of TRAPI 1.3 and https://github.com/biothings/biothings_explorer/issues/482, but breaking this out to its own ticket.
We have a solid use case for edge constraints proposed in this query template for the CQS: https://github.com/TranslatorSRI/CQS/blob/main/templates/mvp1-templates/mvp1-template4-bte-aeolus/mvp1-template5-service-provider-aeolus.json
The key bit is here, attempting to apply a minimum threshold on the
biolink:evidence_count
from AEOLUS.There are (at least) two issues that need to be done/checked:
evidence_count
is provided as a multi-element array (example below). In this case, I think it is reasonable to apply the constraint to the sum of the evidence_counts.