Incomplete population of qualifier patterns in qualified edges leads to incorrect composed edge labels

mbrush commented 7 months ago

I am noting several cases where KPs/ARAs creating qualified edges seem to not be fully populating all qualifier fields necessary to explicitly capture the full semantics of an edge.

e.g. this qualified edge from BTE (link, see support graph for the Captoril Result) is missing qualified_predicate and object_aspect_qualifier slots:

As a result, note that the composed predicate in green in the ARAX UI here is non-sensical ("AVP -regulates-downregulated of-> ACE") - due to the absence of the qualifiers needed to build an accurate composed predicate (which should read "AVP -_causes downregulated_activityof-> ACE)

To enable this, the fully qualified edge here should be:

subject: Gene551
predicate: biolink:regulates
object: Gene1636
qualified_predicate: biolink:causes
object_direction: downregulated
object_aspect: activity

Thoughts @andrewsu @colleenXu?

Another example is this predicted edge created by ARAGORN (@cbizon @uhbrar):

. . . which is missing a causes qualified predicate - leading to the incorrect composed predicate 'affects decreased activity or abundance of' (which should be 'causes decreased activity or abundance of').

Note that this is an issue for display of composed edge labels not just in the ARAX UI as in the examples above, but also in the final Translator UI - which similarly compose these display labels by stitching together predicates and qualifiers in a templated way. e.g.

I suspect that maybe some KG creators are only capturing the qualifiers needed to executed their desired queries. But it is important to capture the full semantics explicitly in all qualified edges, because downstream tooling uses these qualifier values to compose full edge labels in user interfaces.

colleenXu commented 7 months ago

@mbrush (CC @sierra-moxon)

Your proposed adjustment to "regulates" edge-qualifiers makes sense.

However, the current assignments (no qualified-predicate or aspect) were based on the biolink-3 predicate-mapping and migration guide.

So are the recommendations for "regulates" modeling now different? And your adjustment should apply to everyone using "regulates"?

Also, in both examples you show, you say that the qualified predicate is missing. I'd like to note that some of the predicate-mappings don't include a qualified predicate (ex: anywhere where the mapped predicate starts with "affects..."). So are there some cases where it's fine not to have a qualified predicate?

cbizon commented 7 months ago

We can add those qualified predicates to the various automat graphs. We mostly haven't worried about it because they seem largely implied. There doesn't seem to me to be any real difference between the edge above with or without the qualified predicate. That said, the UI is looking for it, and we can add it, so we will do so. My guess is in Eel.

cbizon commented 7 months ago

So it turns out that the automats should all have qualified_predicate where appropriate. I should have looked more carefully, I see that Matt is saying that this is the inferred edge e.g. the primary_source is aragorn. That's a lot easier to fix; we should be able to handle it in Octopus.

mbrush commented 7 months ago

@colleenxu re: the migration guide for mapping - thanks for pointing out these inconsistencies. I think there are some updates that should be made to this mapping file, given how ARAX and Translator UI's ended up using qualifier values to compose edge labels in their Path displays.

For example, in the mapping file you link to, there is no object_aspect_qualifier or qualified_predicate recommendation in the mapping for deprecated predicates like entity negatively regulates entity:

  - mapped predicate: entity negatively regulates entity
    predicate: regulates    
    object direction qualifier: downregulated

I think we need to specify an explicit object aspect and a qualified predicate here, such that this mapping becomes:

  - mapped predicate: entity negatively regulates entity
    predicate: regulates
    qualified_predicate: causes                                  # add this line to the mapping
    object_aspect_qualifier: activity or abundance               # add this line to the mapping
    object direction qualifier: downregulated

In being explicit about the affected aspect here, the semantics are clear for all users, and we support the UI in composing sensible labels (e.g. X causes_downregulated_activity_or_abundance_of Y rather than X regulates_downregulated_of Y) -

I will be sure to address this with the modeling / predicates team - we will have to go through the mapping doc and update those that do not support the UI's 'edge label composition' use case. And we will advertise the need for KPs/ARAs to ensure they are compliant with these updated mapping recommendations. Tagging @sierra-moxon to help me get this on the agenda for these calls as appropriate.

@colleenxu re: mappings without a qualified predicate - there are indeed some cases where it's fine not to have a qualified predicate. This includes times where there is no directionality to a relationship between a chemical and a gene (e.g. affects abundance of) is an example - because a reading of the fully qualified statement is clear/correct using the primary predicate. It is only when there is direction to the affect that we need to add 'causes' as a qualified predicate - because the edge is stating that 'X causes decreased abundance of Y', not 'X affects decreased abundance of Y'

That is the real test of whether a qualified predicate is needed - if the fully qualified statement is clear and accurate using the primary predicate, no qualified predicate is needed.

@cbizon - Thanks for planning to update ARAGORN's qualified edges as you describe above. As you note, it is important to be explicit here in particular because our user interfaces use these qualifiers to compose qualified edge labels in their displays. I suspect that Automat as a KP will also have some edges to update to ensure explicit reporting of aspect and qualified predicate information, per the updated mappings described above.

colleenXu commented 7 months ago

@mbrush

All of the above sounds good.

Would you like our team to wait until the mapping recommendations are updated (VS making changes ASAP)?

mbrush commented 7 months ago

@colleenXu let me run this by predicates folks first - shouldn't take long to confirm this is the right path forward, and update the mappings. Thanks!

cbizon commented 7 months ago

2 questions @mbrush:

In the TRAPI does the qualified_predicate go in the "qualifiers" block of the edge? I suspect so but want to double check.
Should the TRAPI query include the qualified_predicate? ARAGORN is basically creating the inferred edge by copying the qgraph inferred edge...

mbrush commented 6 months ago

@cbizon

In the TRAPI does the qualified_predicate go in the "qualifiers" block of the edge? I suspect so but want to double check.

Yes

Should the TRAPI query include the qualified_predicate? ARAGORN is basically creating the inferred edge by copying the qgraph inferred edge...

I think that in practice many q_graphs take shortcuts by omitting the qualified predicate when it is not strictly necessary to include in order get get desired results. e.g. the 'causes' qualified predicate is usually left out as it is always the qualified predicate on chemical - affects - gene edges - so including it in the query offers no discriminatory power.

We should decide as a consortium if we want to allow for this practice, If we do, then ARAGORN would need another want to ensure the appropriate qualified predicate is included in inferred edges.

Also, re:

There doesn't seem to me to be any real difference between the edge above with or without the qualified predicate.

The difference between Gene X AFFECTS upregulation of Gene Y and Gene X CAUSES upregulation of Gene Y is subtle but important IMO. The former implies that upregulation of Gene Y may be happening already, and Gene X is affecting this process in some way. The latter is saying that Gene X is causing this upregulation to happen in the first place, which is consistent with what we want to report.

cbizon commented 6 months ago

The difference between Gene X AFFECTS upregulation of Gene Y and Gene X CAUSES upregulation of Gene Y is subtle but important IMO. The former implies that upregulation of Gene Y may be happening already, and Gene X is affecting this process in some way. The latter is saying that Gene X is causing this upregulation to happen in the first place, which is consistent with what we want to report.

FWIW, I think that we can almost never distinguish between these cases from the data that we ingest. I don't mind adding the causes at all, but if having it vs not having it is implying this difference, I think that's an overinterpretation. But this is a minor quibble.

The real question is whether the qualified predicate is part of the qedge or not. If it goes in, it will come back out (of aragorn). What's the right place for this conversation? Architecture? TRAPI?

sierra-moxon commented 3 months ago

@edeutsch @cbizon - I have a vague memory of talking about whether biolink:qualified_predicate should be sent via the QEdge in a TRAPI call. Did we ever decide on this convention?

edeutsch commented 3 months ago

Hi everyone, I do not recall if there was a discussion or decision. I think this is largely a data modeling decision. I'm not really certain I fully understand the whole issue, but I'm thinking that some things to not need to be specified in the query and if something is left out of the query, then it amounts to an "any", i.e. no constraint. But the returned KG results DO need to return the information properly. So if no qualified predicate is specified in the query, that's fine, and any edge that matches the rest of the provided constraints must be returned and qualified predicate can be anything in the KG. But the response does need to be specific. Does that answer the question?

sierra-moxon commented 3 months ago

from TAQA:

this is still an issue; Aragorn will not return a qualified predicate if it is not asked for via the query edge, but BTE, when asked for edges with the qualified predicate specified, fails to return answers (maybe some KPs are not assigning the qualified predicate when needed?).
we need to address this with priority for the next release - assigning to Guppy.

colleenXu commented 3 months ago

(CC @andrewsu @tokebe)

If we want BTE to respond to MVP2 queries that set the qualified-predicate, that's an easy edit/fast-deploy for us. I think it could be added to Fugu (add to CI this week?).

But I wonder if there's a second layer here? Are there discussions wanting BTE to adjust its direct/inferred edges in responses? What I recall / can think of...

for the "inferred" edges, add the qualifier-set from the QEdge qualifier-constraints. Right now, it just has the "affects" predicate ➡️ see below, made an issue to do this
Maybe some results have direct edges that don't fully match the qualifier-constraint set asked for, which is confusing?

colleenXu commented 3 months ago

Update! BTE CI will now respond to MVP2 queries that set qualified-predicate (biolink:causes) in the qualifier-constraints (and it'll also respond when the qualified-predicate isn't set). This change should be live in CI as of 11am Pacific today (major commit, refreshing CI instance). Jackson @tokebe plans to add this to BTE's Fugu release.

Jackson and I would still like clarification on the "second layer" points above. Have these been identified in BTE responses as problems, maybe in other issues? @sierra-moxon @sstemann

(CC others who have discussed this issue @gprice1129 @andrewsu)

colleenXu commented 3 months ago

@mbrush

We're waiting for your go-ahead before changing our "regulates" edges, which was the initial ask for BTE in this issue.

Any updates?

sierra-moxon commented 2 months ago

the predicate mapping file has not been updated to reflect the necessary changes for the regulates predicates indicated in Matt's comment. Moving to hammerhead.

tokebe commented 2 months ago

Update: BTE's fix for including qualified-predicate in the response inferred edges has been deployed to Prod

mbrush commented 2 months ago

@sierra-moxon if you agree with the updates to the predicate-mapping file I proposed in the comment here - can we go ahead and make these so that it is complete? This was to be a topic for a predicates call, but we haven't been holding those, and I think if you agree with my proposal, we can go ahead and make the changes in a PR, have folks review, and resolve this issue.

colleenXu commented 1 week ago

@mbrush @sierra-moxon

BTE/Service Provider's qualifiers for "regulates" edges are updated (will go live tomorrow).

I think this wraps up one of the final loose ends from this issue.

NCATSTranslator / Feedback

Incomplete population of qualifier patterns in qualified edges leads to incorrect composed edge labels #744