Open sstemann opened 2 years ago
@MarkDWilliams I made a workflowitized version of this query which overlays ngd values and fisher exact test scores. It then scores the results based on those values and ranks them.
Workflow Query: subclassWorkflow.json ARAX Results
Expected result
Subject | Predicate | Object |
---|---|---|
MONDO:0005737 | biolink:subclass_of | MONDO:0018087 |
MONDO:0005737 | biolink:subclass_of | MONDO:0005108 |
MONDO:0005737 | biolink:subclass_of | MONDO:0005762 |
@cbizon we are doing a little testing to reranking messages from other ARAs and are hitting some snags. We tried reranking the Aragorn results above, but ARAX had trouble with it. The main reason seems be bindings that don't go anywhere. In the Aragorn message: https://arax.ncats.io/api/arax/v1.2/response/99adc6b5-0803-4c52-972e-fe587744a7aa
we see:
"query_graph": {
"edges": {
"e00": {
"constraints": [],
"object": "n01",
"predicates": [
"biolink:subclass_of"
],
"subject": "n00"
}
},
"nodes": {
"n00": {
"categories": [
"biolink:Disease"
],
"constraints": [],
"ids": [
"MONDO:0005737"
],
"is_set": false
},
"n01": {
"categories": [
"biolink:Disease"
],
"constraints": [],
"ids": null,
"is_set": false
}
}
},
"results": [
...
{
"edge_bindings": {
"e00": [
{
"attributes": [
{
"attribute_source": null,
"attribute_type_id": "biolink:has_numeric_value",
"attributes": null,
"description": null,
"original_attribute_name": "weight",
"value": 0.2679491924311228,
"value_type_id": "EDAM:data_1669",
"value_url": null
}
],
"id": "16888608"
}
],
"s14": [
{
"attributes": [
{
"attribute_source": null,
"attribute_type_id": "biolink:has_numeric_value",
"attributes": null,
"description": null,
"original_attribute_name": "weight",
"value": 0.06359757610223138,
"value_type_id": "EDAM:data_1669",
"value_url": null
}
],
"id": "1dfbc4c2-3884-4455-9b55-6e7433e7059b"
}
]
},
So it seems that a result has an edge binding from a KG edge to s14, which does not exist in the QG? So binds to nothing. Did we ever decide whether that is legal TRAPI? ARAX doesn't do that and got very unhappy when it saw it here. We're working to make ARAX more robust in the face of such things. This will be a wider issue where we try to operate on each others' TRAPI and have different assumptions.
We're fixing ARAX to be tolerant of this, but this might cause problems for others as well.
Perhaps this is an issue that we should face head on: is the above semantically valid TRAPI? or should a semantic validator of TRAPI (which we don't have today but should have) flag that as an error?
Current TRAPI documentation would seem to imply that all edge_binding keys must resolve in the QueryGraph:
edge_bindings:
type: object
description: >-
The dictionary of Input Query Graph to Result Knowledge Graph edge
bindings where the dictionary keys are the key identifiers of the
Query Graph edges and the associated values of those keys are
instances of EdgeBinding schema type (see below). This value is an
array of EdgeBindings since a given query edge may resolve to
multiple knowledge graph edges in the result.
Here's a different question related to this test. Maybe we should all be on the same page on this.
Are all entities always a subclass of themselves?
I thought I heard it stated once that this was on a stone tablet? And Aragorn's results state it. But it is not in Mark's list.
Yeah, pretty sure that according to the data modeling group, A is a subclass of A.
In terms of Mark's list, I don't think it's meant to be exhaustive, but a list of what a particular educated user might expect to be highly ranked answers.
But isn't the list what we intend to test against with your testing framework? If so, the scoring on the test should be quite different on whether the self subclass is there. With the current list, Aragorn should be penalized substantially because its number one answer (Ebola itself) is not in the list. If we were to add Ebola itself to the list, then ARAX would be penalized because the number one answer (Ebola itself) is nowhere in our results.
Yes the goal is to capture some good answers to build benchmarks. But (IMO) each benchmark represents one view of what makes a good answer, and a different benchmark might emphasize different qualities.
This set of benchmarks is driven by what a user might expect for simple one hops. I would argue (based on this particular result and the discussion on the call) that no user particularly wants to see the self-subclass result, even if it's useful at the modeling level in maintaining logical consistency.
In other words, I think it's a useful datapoint for ARAGORN to know that it's returning an uninteresting answer (defined in one way) to a user.
Sounds good to me. I would agree that is the best way forward.
So it seems like the proposed formal policy is: From a modeling perspective every concept is also a subclass of itself, but when users of Translator request items with a subclass relationship, they should not be shown the self class as a result.
Should we document that?
Query: subclass.json PK: 24506c4-b3e4-464e-89e9-eb4e456b9ef9 Control: Query looking for what diseases Ebola is a subclass of. Controls should be things like viral infection, filoviridae, or viral hemorrhagic fever