Closed andrewsu closed 1 year ago
Steps 1) update SmartAPI annotations (Target completion as of 10/11 is 11/1)
qualifier_value
s (e.g., localization -> [tissue_localization, subcellular_localization])
gene_or_gene_product_or_chemical_entity_aspect_enum
s that show, for example, abundance
has subclasses degradation
, synthesis
, expression
, etc.Additional notes from my conversation with Sierra and Matt B.:
chemical gene interaction association
and chemical affects gene association
) to assess technical feasibilitysubject: Bisphenol A
predicate: affects
qualified_predicate: causes
object: ESR1
object_aspect_qualifier: degradation
object_direction_qualifier: decreased
One interpretation of this record is the simple SPO triple -- Bisphenol A
affects
ESR1
-- and one is the fully qualified triple -- Bisphenol A
causes
decreased degradation of ESR1
Helpful links:
Other links:
A related issue on "context" edge-attributes (where some work from this may be refactored to fit the qualifier model? https://github.com/biothings/BioThings_Explorer_TRAPI/issues/434
Biolink model lays out "chemical-gene associations" starting line 10301-10399, 11467-11485 (physically_interacts_with, affects, regulates)
For the possible values, it's not clear to me what the case is (snake_case?) UPDATE: they are supposed to be snake_case.
Subject (and object qualifiers):
qualified_predicate
statement qualifier?
Possible aspect values:
Migration guide should have full list of deprecated predicates
My thoughts:
Pasting from Slack:
Also I realized that what we still have a "reversing" issue for Part B: we query in whatever direction we need to retrieve records using IDs. And then we "invert" the predicate so the record/Edge matches the QEdge direction...
But now we would also need to "invert" some qualifiers (change all the "subject" to "object" and vice versa) o_0.
I've pushed the branch add-qualifiers
with experimental support:
Todo list:
Also to clarify, the that the predicate hierarchy (with deprecated predicates viewable at http://tree-viz-biolink.herokuapp.com/predicates/deprecated-predicates) will define the allowed values for both the predicate
and qualified_predicate
slots.
Will likely want to update BTE to biolink-model 3.0.3 for Part 3 of qualifiers, so will want to be aware of this issue https://github.com/biothings/BioThings_Explorer_TRAPI/issues/514
Noting: qualifiers currently aren't included in the TRAPI /meta_knowledge_graph (which could affect Parts 2 and 3) and discussion is still active on the format for querying with qualifiers (Part 3).
Related discussions (TRAPI 1.4 upcoming work and release):
Support for TRAPI KP responses complete, tested via debug value injection.
On to part 3...
With regard to qualifier constraints from queries, I'm a little concerned that current implementation will be both complicated and somewhat fuzzy:
Tagging @andrewsu @colleenXu for input/opinions.
For your first point, yeah that's....not ideal. I'm okay with you going forward with a mix, since the other two types of constraints (at least edge-attribute source stuff) may have a similar mix of pre- and post-query filters.
I imagine that they will want qualifiers added to the /meta_knowledge_graph in future TRAPI releases...so then we'd have to change that a bit in the future.
For the second point, thanks for the heads-up. I'll keep in mind that there's the rest of biolink-model 3 stuff to do after/with the qualifiers for my x-bte annotation work.
@tokebe @andrewsu
Step 1 (update x-bte annotations for qualifier-refactor) is done as of this commit. I did modify a few of Text-Mining's operations.
I ended up doing find-replace for semmeddb operations, and I haven't refactored the semmeddb-operation-generating code yet.
Next I'll work on updating x-bte annotations for the biolink 3.0 migration (not-qualifier-related). https://github.com/biothings/BioThings_Explorer_TRAPI/issues/514
EDIT: from this commit to this commit, the APIs with adjustments for the qualifier-refactor are:
Note that the format (aka case) of qualifier values is still unclear. Internal Slack link.
Noting that qualifier-refactor discussions haven't happened with Multiomics Provider for their APIs:
Their APIs didn't come up as I was doing my find-replace for all the deprecated predicates.
And there's discussion with Text-Mining Provider + Translator Data Modeling group on what predicates/qualifiers to use for "chemical-regulates-gene" relationships (internal Slack link). Depending on how the conversation goes, we may want to revise those operations for Text-Mining's API and other APIs that have "regulates" relationships
News:
X-bte annotation updates:
Update 2022-11-10: modified the code to generate semmeddb operations with the qualifier/biolink3 mappings we're currently doing.
@tokebe Added a commit to query-handler (add-qualifiers branch) to adjust templates for the predicate changes https://github.com/biothings/bte_trapi_query_graph_handler/commit/3a0ca1aa3543f99077579994d328939437318f8a. I haven't tested this change yet.
I'm also not sure about adding qualifier-constraints
Once I have the implementation, I see no reason that you couldn't use qualifier-constraints. You can add them now if you want, since they won't break anything.
@colleenXu I've pushed an experimental implementation of qualifier constrain pre-filtering. Below is the rundown of the behavior:
bte-trapi
tag, which is added by BTE when TRAPI ingesting operations. If the operation has the bte-trapi
tag, it is kept (because the qualifier constraints will be passed in as part of the TRAPI subquery)If this behavior looks good, could you run a few tests to make sure the implementation is behaving as expected (I've run some preliminary tests and it looks fine)? I'll be working on ensuring everything's set for adding in biolink 3.
@tokebe I found this set of implementation rules on qualifier_constraints https://github.com/NCATSTranslator/ReasonerAPI/blob/v1.3.0/ImplementationRules.md#qedgequalifier_constraints
I'm aware of this set of rules. My implementation follows these; however these rules are very broad and do not cover the specifics of qualifier-constraint matching for a KP, which we must follow for our non-TRAPI KPs.
These implementation rules only specify "is compatible with", which I've interpreted both to the best of my ability and with my understanding from previous conversations. I'm asking if you could look over the rundown I wrote and confirm that that specific set of cases looks fine.
Biolink-3 implementation seems relatively straightforward with only some test fixes. @andrewsu does it make sense to leave hierarchy traversal for qualifiers (and qualifier-constraint pre-filtering) for another issue?
@andrewsu does it make sense to leave hierarchy traversal for qualifiers (and qualifier-constraint pre-filtering) for another issue?
Yes, I think that makes sense. This issue is already getting a little bit unwieldy, so breaking things down to a series of more focused issues seems like a good plan...
@colleenXu I'm rolling the biolink 3.0.3 rollout into the qualifiers issue given that they're both very related and the biolink 3 rollout won't require a lot of work.
What's left (some may be put in separate issues):
Do we want to somehow add qualifiers to our /meta_knowledge_graph endpoint? There hasn't been any guidance on this from Translator that I'm aware of but I imagine it's something that will be expected eventually...
@tokebe I think it makes sense to roll the https://github.com/biothings/BioThings_Explorer_TRAPI/issues/514 and this qualifier issue into the same set of branches/PRs.
I plan to write up how I manually test qualifier-constraints - which may help for writing tests
I think the qualifier-value hierarchy traversal would definitely be a separate issue.
I don't think we need to add qualifiers to our /meta_knowledge_graph endpoint for this issue because we don't know the format yet (from TRAPI) and we're not leading the effort there...
Update 2022-11-15: with this code fix, the behavior with 1 set of 1 constraint looks mostly fixed. There is still an issue with qualified_predicate (described in the later comment).
Qualifier constraints doesn't seem to work: compare the first and second queries below
Testing setup
add-qualifier
branches for
- biolink-model
- api-response-transform
- bte-trapi (probably optional - I remember making this branch)
- call-apis
- query_graph_handler
- smartapi-kg
- checkout bte-trapi's dev branch's src/config/smartapi_overrides.json file. It has all the API overrides to use biolink3/qualifier-updated SmartAPI annotations
- run
API_OVERRIDE=true npm run smartapi_sync
to get all the API overridesFirst set of testing
I used DGIdb's x-bte annotation that has qualifiers as a source. And I ran my queries restricted to using DGIdb only (link has
v1/smartapi/e3edd325c76f2992a111b43a907a4870/query
).First: as a baseline - no qualifier_constraints
[KCNQ4.txt](https://github.com/biothings/BioThings_Explorer_TRAPI/files/10145644/KCNQ4.txt) Currently, I get 9 results with 11 Edges: * 1 result Celecoxib has only 1 Edge that doesn't have qualifiers (predicate is interacts_with) * 5 results have Edges with qualifiers `decreased activity molecular_channel_blockage`: Guanidine hydrochloride, Nerispirdine, 4-Aminopyridine, TEDISAMIL, Bepridil * 1 result Linopirdine has Edge with qualifiers `decreased activity_or_abundance inhibition` * 1 result Flindokalner has Edge with qualifiers `increased activity activation` * 1 result Retigabine has 3 Edges (one with interacts_with, one with the `decreased activity molecular_channel_blockage` qualifiers and one with the `increased activity activation`).query
``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["NCBIGene:9132"], "categories":["biolink:Gene"] }, "n1": { "categories":["biolink:SmallMolecule"] } }, "edges": { "e1": { "subject": "n0", "object": "n1" } } } } } ```Second: 1 set of 1 constraint
Compared to the first query, I'd expect to get only 2 of the results, Flindokalner and Retigabine, since both had edges with the qualifiers `increased activity activation`. Instead, I'm getting 8 results (all the results with "affected_by" as the predicate).query
``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["NCBIGene:9132"], "categories":["biolink:Gene"] }, "n1": { "categories":["biolink:SmallMolecule"] } }, "edges": { "e1": { "subject": "n0", "object": "n1", "qualifier_constraints": [ { "qualifier_set": [ { "qualifier_type_id": "biolink:subject_direction_qualifier", "qualifier_value": "increased" } ] } ] } } } } } ```
Update 2022-11-15: with this code fix, the behavior looks fixed. Yay!
The "reversing" of qualifiers to match the QEdge direction isn't working as-expected. I'm running my queries restricted to using DGIdb only (link has v1/smartapi/e3edd325c76f2992a111b43a907a4870/query).
If I run a query like Gene -(affected_by)-> SmallMolecule X, I expect BTE to "reverse" the edge for execution (so it can start with the SmallMolecule X ID).
BTE will then run the sub-query SmallMolecule X -(affects)-> Gene. I know DGIdb's x-bte annotation will have the "causes" qualifiedpredicate and "object*" qualifiers in the sub-query responses.
query to test correct reversal
``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["CHEMBL.COMPOUND:CHEMBL730"], "categories":["biolink:SmallMolecule"], "name": "NITROGLYCERIN" }, "n1": { "categories":["biolink:Gene"] } }, "edges": { "e1": { "subject": "n1", "object": "n0", "predicates": ["biolink:affected_by"] } } } } } ```Afterwards, BTE should reverse the predicate (affects) and the qualifiers for the Edges, so the Edges in the response have the predicate "affected_by" (same direction as QEdge), "caused_by" as the qualifiedpredicate, and "subject*" qualifiers.
However, the current code doesn't correctly reverse the qualifiers. It does correctly reverse the predicate.
Here's the response I get to the query: example_reverse.txt
Update 2022-11-15: with this code fix, (2) seems to be working. (1) isn't fully working - there are still issues with qualified_predicate (listed in a later comment)
I deleted a post and edited headings for the two posts above, to make these main concerns clearer. I'm also organizing the various things that have come up, in this post.
Here are some smaller features we want after recent discussions:
- when
qualifier_type_id
isqualified_predicate
, thequalifier_value
should have a biolink-prefix ("biolink:causes" or "biolink:caused_by"). This is for qualifiers in Edges and qualifier_constraints in QEdges.
- @tokebe 2022-11-11 said we can do this in code, so we don't have to add the biolink-prefix to x-bte annotation (similar to how we treat predicates + node categories)
- query-validation:
- UPDATE 2022-11-14 6pm PT, after discussion with Andrew: this is optional. We don't have to worry about query-validation because we should expect queries to the guidelines set by Data-Modeling/TRAPI groups.
- "within 1 qualifier_set, each object must have a unique value for the
qualifier_type_id
field". This is what I think point 1i of this doc means.
- If there are objects in a set with the same
qualifier_type_id
, I guess we'd want BTE to not execute further and return an informative error (there are two qualifier_constraints in the same set that have the same qualifier_type_id; please modify the query so each qualifier_type_id in a set is unique
)
UPDATE 2022-11-14 6pm PT, after discussion with Andrew:
Don't worry about TRAPI KPs and how they'd handle "multiple curies" + "qualifier constraints":
We're also asking for clarification... According to the TRAPI 1.3.0 spec, qualifier_constraints are not supposed to be on a QEdge when the QNodes of that specific QEdge have multiple curies. BTE's implementation will likely handle multiple curies + qualifier_constraints without issues...but this may have consequences for how we do TRAPI-API sub-querying with qualifier_constraints:
- Do we have to send IDs one-at-a-time when that hop has qualifier_constraints?
- Do we have to send the query without qualifier_constraints, and then do post-filtering with the qualifier_constraints?
Some notes:
qualifier_value
values that are enums from biolink-model DO NOT need the biolink-prefixFuture issues from this "requirements" doc:
qualifier_value
's value can be an enum from the biolink-model, and we're expected to "expand" qualifier_constraint terms using the enum hierarchy (for example, querying for activity_or_abundance
should bring back Edges that have expression
)qualifier_value
's value can also be an ontology-term (curie) and we're expected to "expand" qualifier_constraint terms using the ontology...
qualifier_type_id
s...@colleenXu I've pushed fixes to each respective package.
biolink:
prefix missingEDIT: done retesting. From my testing POV, there were only 2 small things with qualified_predicate, which are probably related and easily fixed. Once these are addressed, this issue (Steps 1-3) may be complete...
Note: Testing setup is the same as above (aka using only DGIdb at the moment).)
It's expected to work as "AND" (both conditions must be met on one metaKG edge).
It's expected to work as "OR" (at least one condition must be met on one metaKG edge).
Useful basic queries for qualifier-constraints:
Each of these has at least 3 kinds of qualifier-combos (increased activity, decreased activity, decreased activity_or_abundance)
@colleenXu I pushed fixes for these, currently working on fixing any broken tests (most failing tests on the query_graph_handler are due to changes causing qEdge hashes to change).
Edit: there are still a few more minor fixes to be made, will update when everything's ready for another round of testing.
Tested the latest code for qualified_predicate handling. Now it works, whether the value has the biolink prefix or not.
Did some quick testing of other queries with qualified predicates -> these still work as well.
Alright, I think I can leave it to a future validation issue for not supporting non-prefixed values, if we care about that.
Don't think we care about it :P
Deployed to prod 🚀
This is a new issue to track our team's implementation of this issue on qualifiers for chemical-gene associations: https://github.com/NCATSTranslator/ReasonerAPI/issues/348
From that issue:
According to the progress tracker, the target completion date for this issue is 2022-10-25
related/broader issue on constraints: https://github.com/biothings/BioThings_Explorer_TRAPI/issues/482