biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
10 stars 11 forks source link

update modeling to reflect qualifier changes for chemical -> gene associations #512

Closed andrewsu closed 1 year ago

andrewsu commented 2 years ago

This is a new issue to track our team's implementation of this issue on qualifiers for chemical-gene associations: https://github.com/NCATSTranslator/ReasonerAPI/issues/348

From that issue:

The schema changes are here: https://github.com/NCATSTranslator/ReasonerAPI/pull/330/files and there are some examples here: https://github.com/NCATSTranslator/ReasonerAPI/tree/master/examples/Message

According to the progress tracker, the target completion date for this issue is 2022-10-25

related/broader issue on constraints: https://github.com/biothings/BioThings_Explorer_TRAPI/issues/482

andrewsu commented 2 years ago

Steps 1) update SmartAPI annotations (Target completion as of 10/11 is 11/1)

andrewsu commented 2 years ago

Additional notes from my conversation with Sierra and Matt B.:

subject: Bisphenol A
predicate: affects 
qualified_predicate: causes
object: ESR1
object_aspect_qualifier: degradation
object_direction_qualifier: decreased

One interpretation of this record is the simple SPO triple -- Bisphenol A affects ESR1 -- and one is the fully qualified triple -- Bisphenol A causes decreased degradation of ESR1

colleenXu commented 1 year ago

Helpful links:

Other links:

A related issue on "context" edge-attributes (where some work from this may be refactored to fit the qualifier model? https://github.com/biothings/BioThings_Explorer_TRAPI/issues/434

colleenXu commented 1 year ago

Biolink model lays out "chemical-gene associations" starting line 10301-10399, 11467-11485 (physically_interacts_with, affects, regulates)

For the possible values, it's not clear to me what the case is (snake_case?) UPDATE: they are supposed to be snake_case.

Subject (and object qualifiers):

qualified_predicate

statement qualifier?

Possible aspect values:


Migration guide should have full list of deprecated predicates

My thoughts:

colleenXu commented 1 year ago

Pasting from Slack:

Also I realized that what we still have a "reversing" issue for Part B: we query in whatever direction we need to retrieve records using IDs. And then we "invert" the predicate so the record/Edge matches the QEdge direction...

But now we would also need to "invert" some qualifiers (change all the "subject" to "object" and vice versa) o_0.

tokebe commented 1 year ago

I've pushed the branch add-qualifiers with experimental support:

Todo list:

andrewsu commented 1 year ago

Also to clarify, the that the predicate hierarchy (with deprecated predicates viewable at http://tree-viz-biolink.herokuapp.com/predicates/deprecated-predicates) will define the allowed values for both the predicate and qualified_predicate slots.

colleenXu commented 1 year ago

Will likely want to update BTE to biolink-model 3.0.3 for Part 3 of qualifiers, so will want to be aware of this issue https://github.com/biothings/BioThings_Explorer_TRAPI/issues/514

colleenXu commented 1 year ago

Noting: qualifiers currently aren't included in the TRAPI /meta_knowledge_graph (which could affect Parts 2 and 3) and discussion is still active on the format for querying with qualifiers (Part 3).

Related discussions (TRAPI 1.4 upcoming work and release):

tokebe commented 1 year ago

Support for TRAPI KP responses complete, tested via debug value injection.

On to part 3...

tokebe commented 1 year ago

With regard to qualifier constraints from queries, I'm a little concerned that current implementation will be both complicated and somewhat fuzzy:

Tagging @andrewsu @colleenXu for input/opinions.

colleenXu commented 1 year ago

For your first point, yeah that's....not ideal. I'm okay with you going forward with a mix, since the other two types of constraints (at least edge-attribute source stuff) may have a similar mix of pre- and post-query filters.

I imagine that they will want qualifiers added to the /meta_knowledge_graph in future TRAPI releases...so then we'd have to change that a bit in the future.


For the second point, thanks for the heads-up. I'll keep in mind that there's the rest of biolink-model 3 stuff to do after/with the qualifiers for my x-bte annotation work.

colleenXu commented 1 year ago

@tokebe @andrewsu

Step 1 (update x-bte annotations for qualifier-refactor) is done as of this commit. I did modify a few of Text-Mining's operations.

I ended up doing find-replace for semmeddb operations, and I haven't refactored the semmeddb-operation-generating code yet.

Next I'll work on updating x-bte annotations for the biolink 3.0 migration (not-qualifier-related). https://github.com/biothings/BioThings_Explorer_TRAPI/issues/514

EDIT: from this commit to this commit, the APIs with adjustments for the qualifier-refactor are:

colleenXu commented 1 year ago

Update on Step 1

Note that the format (aka case) of qualifier values is still unclear. Internal Slack link.


Noting that qualifier-refactor discussions haven't happened with Multiomics Provider for their APIs:

Their APIs didn't come up as I was doing my find-replace for all the deprecated predicates.


And there's discussion with Text-Mining Provider + Translator Data Modeling group on what predicates/qualifiers to use for "chemical-regulates-gene" relationships (internal Slack link). Depending on how the conversation goes, we may want to revise those operations for Text-Mining's API and other APIs that have "regulates" relationships

colleenXu commented 1 year ago

News:


X-bte annotation updates:

Update 2022-11-10: modified the code to generate semmeddb operations with the qualifier/biolink3 mappings we're currently doing.

colleenXu commented 1 year ago

@tokebe Added a commit to query-handler (add-qualifiers branch) to adjust templates for the predicate changes https://github.com/biothings/bte_trapi_query_graph_handler/commit/3a0ca1aa3543f99077579994d328939437318f8a. I haven't tested this change yet.

I'm also not sure about adding qualifier-constraints

tokebe commented 1 year ago

Once I have the implementation, I see no reason that you couldn't use qualifier-constraints. You can add them now if you want, since they won't break anything.

tokebe commented 1 year ago

@colleenXu I've pushed an experimental implementation of qualifier constrain pre-filtering. Below is the rundown of the behavior:

If this behavior looks good, could you run a few tests to make sure the implementation is behaving as expected (I've run some preliminary tests and it looks fine)? I'll be working on ensuring everything's set for adding in biolink 3.

colleenXu commented 1 year ago

@tokebe I found this set of implementation rules on qualifier_constraints https://github.com/NCATSTranslator/ReasonerAPI/blob/v1.3.0/ImplementationRules.md#qedgequalifier_constraints

tokebe commented 1 year ago

I'm aware of this set of rules. My implementation follows these; however these rules are very broad and do not cover the specifics of qualifier-constraint matching for a KP, which we must follow for our non-TRAPI KPs.

These implementation rules only specify "is compatible with", which I've interpreted both to the best of my ability and with my understanding from previous conversations. I'm asking if you could look over the rundown I wrote and confirm that that specific set of cases looks fine.

Biolink-3 implementation seems relatively straightforward with only some test fixes. @andrewsu does it make sense to leave hierarchy traversal for qualifiers (and qualifier-constraint pre-filtering) for another issue?

andrewsu commented 1 year ago

@andrewsu does it make sense to leave hierarchy traversal for qualifiers (and qualifier-constraint pre-filtering) for another issue?

Yes, I think that makes sense. This issue is already getting a little bit unwieldy, so breaking things down to a series of more focused issues seems like a good plan...

tokebe commented 1 year ago

@colleenXu I'm rolling the biolink 3.0.3 rollout into the qualifiers issue given that they're both very related and the biolink 3 rollout won't require a lot of work.

What's left (some may be put in separate issues):

Do we want to somehow add qualifiers to our /meta_knowledge_graph endpoint? There hasn't been any guidance on this from Translator that I'm aware of but I imagine it's something that will be expected eventually...

colleenXu commented 1 year ago

@tokebe I think it makes sense to roll the https://github.com/biothings/BioThings_Explorer_TRAPI/issues/514 and this qualifier issue into the same set of branches/PRs.

I plan to write up how I manually test qualifier-constraints - which may help for writing tests

I think the qualifier-value hierarchy traversal would definitely be a separate issue.

I don't think we need to add qualifiers to our /meta_knowledge_graph endpoint for this issue because we don't know the format yet (from TRAPI) and we're not leading the effort there...

colleenXu commented 1 year ago

Feedback Part 1

Update 2022-11-15: with this code fix, the behavior with 1 set of 1 constraint looks mostly fixed. There is still an issue with qualified_predicate (described in the later comment).

Qualifier constraints doesn't seem to work: compare the first and second queries below

Testing setup

  • add-qualifier branches for
    • biolink-model
    • api-response-transform
    • bte-trapi (probably optional - I remember making this branch)
    • call-apis
    • query_graph_handler
    • smartapi-kg
  • checkout bte-trapi's dev branch's src/config/smartapi_overrides.json file. It has all the API overrides to use biolink3/qualifier-updated SmartAPI annotations
  • run API_OVERRIDE=true npm run smartapi_sync to get all the API overrides

First set of testing

I used DGIdb's x-bte annotation that has qualifiers as a source. And I ran my queries restricted to using DGIdb only (link has v1/smartapi/e3edd325c76f2992a111b43a907a4870/query).

First: as a baseline - no qualifier_constraints
query ``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["NCBIGene:9132"], "categories":["biolink:Gene"] }, "n1": { "categories":["biolink:SmallMolecule"] } }, "edges": { "e1": { "subject": "n0", "object": "n1" } } } } } ```
[KCNQ4.txt](https://github.com/biothings/BioThings_Explorer_TRAPI/files/10145644/KCNQ4.txt) Currently, I get 9 results with 11 Edges: * 1 result Celecoxib has only 1 Edge that doesn't have qualifiers (predicate is interacts_with) * 5 results have Edges with qualifiers `decreased activity molecular_channel_blockage`: Guanidine hydrochloride, Nerispirdine, 4-Aminopyridine, TEDISAMIL, Bepridil * 1 result Linopirdine has Edge with qualifiers `decreased activity_or_abundance inhibition` * 1 result Flindokalner has Edge with qualifiers `increased activity activation` * 1 result Retigabine has 3 Edges (one with interacts_with, one with the `decreased activity molecular_channel_blockage` qualifiers and one with the `increased activity activation`).
Second: 1 set of 1 constraint
query ``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["NCBIGene:9132"], "categories":["biolink:Gene"] }, "n1": { "categories":["biolink:SmallMolecule"] } }, "edges": { "e1": { "subject": "n0", "object": "n1", "qualifier_constraints": [ { "qualifier_set": [ { "qualifier_type_id": "biolink:subject_direction_qualifier", "qualifier_value": "increased" } ] } ] } } } } } ```
Compared to the first query, I'd expect to get only 2 of the results, Flindokalner and Retigabine, since both had edges with the qualifiers `increased activity activation`. Instead, I'm getting 8 results (all the results with "affected_by" as the predicate).
colleenXu commented 1 year ago

Feedback Part 2

Update 2022-11-15: with this code fix, the behavior looks fixed. Yay!

The "reversing" of qualifiers to match the QEdge direction isn't working as-expected. I'm running my queries restricted to using DGIdb only (link has v1/smartapi/e3edd325c76f2992a111b43a907a4870/query).

If I run a query like Gene -(affected_by)-> SmallMolecule X, I expect BTE to "reverse" the edge for execution (so it can start with the SmallMolecule X ID).

BTE will then run the sub-query SmallMolecule X -(affects)-> Gene. I know DGIdb's x-bte annotation will have the "causes" qualifiedpredicate and "object*" qualifiers in the sub-query responses.

query to test correct reversal ``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["CHEMBL.COMPOUND:CHEMBL730"], "categories":["biolink:SmallMolecule"], "name": "NITROGLYCERIN" }, "n1": { "categories":["biolink:Gene"] } }, "edges": { "e1": { "subject": "n1", "object": "n0", "predicates": ["biolink:affected_by"] } } } } } ```

Afterwards, BTE should reverse the predicate (affects) and the qualifiers for the Edges, so the Edges in the response have the predicate "affected_by" (same direction as QEdge), "caused_by" as the qualifiedpredicate, and "subject*" qualifiers.

However, the current code doesn't correctly reverse the qualifiers. It does correctly reverse the predicate.

Here's the response I get to the query: example_reverse.txt

colleenXu commented 1 year ago

Feedback Part 3

Update 2022-11-15: with this code fix, (2) seems to be working. (1) isn't fully working - there are still issues with qualified_predicate (listed in a later comment)

I deleted a post and edited headings for the two posts above, to make these main concerns clearer. I'm also organizing the various things that have come up, in this post.

Here are some smaller features we want after recent discussions:

  1. when qualifier_type_id is qualified_predicate, the qualifier_value should have a biolink-prefix ("biolink:causes" or "biolink:caused_by"). This is for qualifiers in Edges and qualifier_constraints in QEdges.
    • @tokebe 2022-11-11 said we can do this in code, so we don't have to add the biolink-prefix to x-bte annotation (similar to how we treat predicates + node categories)
  2. query-validation:
    • UPDATE 2022-11-14 6pm PT, after discussion with Andrew: this is optional. We don't have to worry about query-validation because we should expect queries to the guidelines set by Data-Modeling/TRAPI groups.
    • "within 1 qualifier_set, each object must have a unique value for the qualifier_type_id field". This is what I think point 1i of this doc means.
      • If there are objects in a set with the same qualifier_type_id, I guess we'd want BTE to not execute further and return an informative error (there are two qualifier_constraints in the same set that have the same qualifier_type_id; please modify the query so each qualifier_type_id in a set is unique)

UPDATE 2022-11-14 6pm PT, after discussion with Andrew:

Don't worry about TRAPI KPs and how they'd handle "multiple curies" + "qualifier constraints":

We're also asking for clarification... According to the TRAPI 1.3.0 spec, qualifier_constraints are not supposed to be on a QEdge when the QNodes of that specific QEdge have multiple curies. BTE's implementation will likely handle multiple curies + qualifier_constraints without issues...but this may have consequences for how we do TRAPI-API sub-querying with qualifier_constraints:

  • Do we have to send IDs one-at-a-time when that hop has qualifier_constraints?
  • Do we have to send the query without qualifier_constraints, and then do post-filtering with the qualifier_constraints?

Some notes:


Future issues from this "requirements" doc:

tokebe commented 1 year ago

@colleenXu I've pushed fixes to each respective package.

colleenXu commented 1 year ago

Feedback round 2

EDIT: done retesting. From my testing POV, there were only 2 small things with qualified_predicate, which are probably related and easily fixed. Once these are addressed, this issue (Steps 1-3) may be complete...

Qualified predicates

Note: Testing setup is the same as above (aka using only DGIdb at the moment).)

  1. Qualified_predicate in a qualifier_constraint is not working when the value (predicate) includes the biolink-prefix (see example query below), which is the format we're supposed to follow. Currently, BTE finds no matching metaKG edges. I notice that if I provide the qualified predicate without the biolink-prefix, it works. ATP_1.txt
correct qualified_predicate-format example ``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["CHEMBL.COMPOUND:CHEMBL14249"], "categories":["biolink:SmallMolecule"], "name": "ATP" }, "n1": { "categories":["biolink:Gene"] } }, "edges": { "e1": { "subject": "n0", "object": "n1", "qualifier_constraints": [ { "qualifier_set": [ { "qualifier_type_id": "biolink:qualified_predicate", "qualifier_value": "biolink:causes" } ] } ] } } } } } ```
  1. BTE isn't finding metaKG edges when I test the qualified_predicate constraint with a QEdge that'll need reversal, even when the predicate biolink-prefix is removed:
"reversed QEdge" version of the query above Look carefully at what the QEdge subject and object are (Gene -> ATP) and the qualified predicate. ``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["CHEMBL.COMPOUND:CHEMBL14249"], "categories":["biolink:SmallMolecule"], "name": "ATP" }, "n1": { "categories":["biolink:Gene"] } }, "edges": { "e1": { "subject": "n1", "object": "n0", "qualifier_constraints": [ { "qualifier_set": [ { "qualifier_type_id": "biolink:qualified_predicate", "qualifier_value": "biolink:caused_by" } ] } ] } } } } } ``` Like the query above, there should be 15 results when the metaKG edges are found.

Correction #1: Having two constraints in the same set is working.

It's expected to work as "AND" (both conditions must be met on one metaKG edge).

compare query with 1 constraint vs 2 constraints First, the query w/ 1 constraint has 3 results. One of the results, TRPM4, has *two edges* - one for "activity" and one for "activity_or_abundance". [ATP_decreased.txt](https://github.com/biothings/BioThings_Explorer_TRAPI/files/10145670/ATP_decreased.txt)
1 constraint ``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["CHEMBL.COMPOUND:CHEMBL14249"], "categories":["biolink:SmallMolecule"], "name": "ATP" }, "n1": { "categories":["biolink:Gene"] } }, "edges": { "e1": { "subject": "n0", "object": "n1", "qualifier_constraints": [ { "qualifier_set": [ { "qualifier_type_id": "biolink:object_direction_qualifier", "qualifier_value": "decreased" } ] } ] } } } } } ```
Compare it to the response with 2 constraints (AND). While it also has 3 results, TRPM4, only has 1 edge - the one for "activity" to match the 2nd constraint. [ATP_decreased_activity.txt](https://github.com/biothings/BioThings_Explorer_TRAPI/files/10145677/ATP_decreased_activity.txt)
1 constraint ``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["CHEMBL.COMPOUND:CHEMBL14249"], "categories":["biolink:SmallMolecule"], "name": "ATP" }, "n1": { "categories":["biolink:Gene"] } }, "edges": { "e1": { "subject": "n0", "object": "n1", "qualifier_constraints": [ { "qualifier_set": [ { "qualifier_type_id": "biolink:object_direction_qualifier", "qualifier_value": "decreased" }, { "qualifier_type_id": "biolink:object_aspect_qualifier", "qualifier_value": "activity" } ] } ] } } } } } ```

Correction #2: Having 2 sets of constraints (1 in each) is working.

It's expected to work as "OR" (at least one condition must be met on one metaKG edge).

example with 2 sets of constraints ``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["CHEMBL.COMPOUND:CHEMBL14249"], "categories":["biolink:SmallMolecule"], "name": "ATP" }, "n1": { "categories":["biolink:Gene"] } }, "edges": { "e1": { "subject": "n0", "object": "n1", "qualifier_constraints": [ { "qualifier_set": [ { "qualifier_type_id": "biolink:object_direction_qualifier", "qualifier_value": "increased" } ] }, { "qualifier_set": [ { "qualifier_type_id": "biolink:object_direction_qualifier", "qualifier_value": "decreased" } ] } ] } } } } } ``` This will return 15 results - some with `increased` and some with `decreased` qualifiers.

Useful basic queries for qualifier-constraints:

Each of these has at least 3 kinds of qualifier-combos (increased activity, decreased activity, decreased activity_or_abundance)

tokebe commented 1 year ago

@colleenXu I pushed fixes for these, currently working on fixing any broken tests (most failing tests on the query_graph_handler are due to changes causing qEdge hashes to change).

Edit: there are still a few more minor fixes to be made, will update when everything's ready for another round of testing.

colleenXu commented 1 year ago

Tested the latest code for qualified_predicate handling. Now it works, whether the value has the biolink prefix or not.

Did some quick testing of other queries with qualified predicates -> these still work as well.

tokebe commented 1 year ago

Alright, I think I can leave it to a future validation issue for not supporting non-prefixed values, if we care about that.

colleenXu commented 1 year ago

Don't think we care about it :P

tokebe commented 1 year ago

Deployed to prod 🚀