RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 21 forks source link

Issue with canonical, non-canonical, and asymmetric predicates #1689

Closed dkoslicki closed 4 months ago

dkoslicki commented 3 years ago

From Chris Bizon:

Hi all - I am trying to merge a couple of queries for workflow A. (this and this ). The problem is that the canonical versions of the predicates point in different directions. So to get them to point the same way (for combining into a single query) I have to de-canonicalize one predicate, causing ARAX to throw a 400. Any chance of ARAX loosening that requirement?

Here's the query:

{
    "message": {
        "query_graph": {
            "edges": {
                "e01": {
                    "object": "n0",
                    "subject": "n1",
                    "predicates": [
                        "biolink:entity_negatively_regulates_entity",
                        "biolink:decreases_abundance_of",
                        "biolink:decreases_expression_of",
                        "biolink:decreases_stability_of",
                        "biolink:decreases_uptake_of",
                        "biolink:increases_degradation_of",
                        "biolink:decreases_synthesis_of",
                        "biolink:decreases_activity_of",
                        "biolink:response_affected_by"
                    ]
                }
            },
            "nodes": {
                "n0": {
                    "ids": [
                        "NCBIGene:23221"
                    ],
                    "categories": [
                        "biolink:Gene"
                    ]
                },
                "n1": {
                    "categories": [
                        "biolink:SmallMolecule"
                    ]
                }
            }
        }
    }
}

Which returns the error message: Qedge e01 has asymmetric predicates in both canonical and non-canonical forms, which isn't allowed. Non-canonical asymmetric predicates are: {'biolink:response_affected_by'}

I can't find this error message in the codebase, so don't know if it's an ARAX, RTX-KG2, or plover thing.

Is this desired behavior, or is there anything we want to address this? The issue appears to be predicates on the edge which are descendants of affects as well as affected_by.

amykglen commented 3 years ago

this might be a good issue for discussion - it's surprisingly tricky to address (I think Expand would have to split the qedge into two qedges behind the scenes - one using all but biolink:response_affected_by and the other using only biolink:affects_response_to, but with flipped subject/object - run queries for those two qedges separately, and then merge their answers once done)

the error is coming from ARAX_expander.py, by the way: https://github.com/RTXteam/RTX/blob/c3414e766c04669025da55170c241c732bf52f16/code/ARAX/ARAXQuery/ARAX_expander.py#L197-L199

and for those who aren't aware: technically we're not required to support usage of non-canonical predicates in QGs, though it definitely may be in our interest to do so. currently we do support usage of non-canonical predicates, except for in twisted situations such as this one where Expand can't work out how to handle it without splitting the qedge in two.

cbizon commented 3 years ago

Any further thoughts on this? It would be nice to be able to merge two of the workflow A queries.

saramsey commented 3 years ago

Hi @cbizon per my reading of Point 19 of the Translator Architecture Principles,

When querying and returning results with predicates, KPs and ARAs must be queried using the 'canonical' predicate (as opposed to its inverse), and must return the 'canonical' predicate. There will be two ways to identify the 'canonical' predicate in the biolink-model: canonical translator predicates will not be tagged with the 'inverse:' attribute, and canonical predicates will be tagged with an "annotations" flag with the tag: "biolink:canonical_predicate" and value: "True". This principle also applies to KGX files and TRAPI messages.

the above query is not valid because it contains predicate biolink:response_affected_by, which is not a canonical-orientation predicate as shown in the Biolink metamodel where we see inverse on line 2901: https://github.com/biolink/biolink-model/blob/ba308e15af161560e75cac8c2962eae37d507a8e/biolink-model.yaml#L2895-L2901

Forgive me for asking, but is it not possible to modify the workflow to post the query using one (or more) valid TRAPI queries in which only canonical-orientation predicates are specified?

[h/t @edeutsch for pointing me to the correct Architecture document]

saramsey commented 3 years ago

If we want for the above query (as written) to be required to be accepted by an ARA, shouldn't we amend the Translator Architecture Principles so that the query is no longer invalid per the spec?

Now, as to the question of how Expand could in theory accommodate such a query, Amy's suggestion sounds reasonable to me.

cbizon commented 3 years ago

Yes, you are absolutely correct. In discussions on the architecture call, there seems to have been a general agreement that the canonical predicates have made this situation worse rather than better, because it ends up requiring multiple trapi queries somewhat pointlessly. However, we deferred actually modifying the specs until a bit later, when things were more settled.

We do actually have this as 2 separate queries now in the workflow, and that will be fine, and I can understand wanting to stick with that for the time being.

That said... I think a number of other ARAs have decided to allow this invalid query to run in anticipation of the expected (but not guaranteed) change, and my question is whether ARAX is interested in going down that road. "no" is an entirely reasonable answer.

saramsey commented 3 years ago

Hi @cbizon, thanks for your reply. In the meantime (before the spec is changed), I wonder if it is possible for the Architecture Committee to consider another mitigation approach: If we are finding that one of the canonical-orientation Biolink predicates (e.g., biolink:affects_response_to, has an orientation that is "out of step" with the other predicates that are commonly used to subject/object connect nodes of similar categories, why not ask the Biolink folks to re-define the canonical orientation for that predicate? In this case, I checked CTD, and it has 7,441 "affects response to" associations between human genes and chemicals, as shown here:

Screen Shot 2021-10-14 at 3 43 06 PM

But most of the time, when we are expressing query edges between a drug and a gene it is with the drug as the subject and gene as the object (right?), and thus, couldn't one argue that it is logical to have the canonical orientation of the predicate be the orientation in which the subject is most likely to be a chemical substance and the object is most likely to be a gene, i.e., biolink:response_affected_by.

Tagging @edeutsch to get his take.

cbizon commented 3 years ago

Interesting point, what do you think about this @sierra-moxon ?

edeutsch commented 3 years ago

I agree that if we're going to enforce canonical directions, it would be very useful to harmonize them so that related predicates all go in the same direction. It might not be possible to make this be 100% harmonized, but maybe solve this problem and 95% of others like it?

sierra-moxon commented 3 years ago

After a short off-line conversation with @cbizon, I think the fix for this use case would be to change the canonical tag of this predicate pair, to the 'response affected by' predicate.

@edeutsch - do you have other use cases where the canonical predicate seems incorrect according to use cases?

edeutsch commented 3 years ago

I am not aware of any others, but I am not very deep in this. I'm thinking that @saramsey , @amykglen , or Patrick would be more likely to know of other issues than I would.

sierra-moxon commented 3 years ago

@saramsey @amykglen - what about the similar predicates: increases response to, decreases response to ? It does feel a little weird that the parent predicate 'affects' is canonical while its inverse 'affected by' is not canonical, but for this small family of predicates that are children of 'affects response to' and grandchildren of 'affects' we swap the canonicity?

I think another (perhaps tangential modeling issue, not specifically related to this issue), is that the model currently describes the domain/range of 'response affected by' as 'chemical entity.' As we've moved gene out of the 'chemical entity' hierarchy, this predicate should technically not bring back any additional results since we specify 'gene' as one side of this triple.

sierra-moxon commented 3 years ago

release 2.2.6 of the model has the canonical tag swapped on this predicate pair.

edeutsch commented 4 months ago

I think this was addressed years ago. closing.