Closed tokebe closed 2 months ago
Looks like this is being caused by the following process:
originalID
-> descendantID
descendantID -> has_phenotype -> originalID
originalID != descendantID
descendantID -> has_phenotype -> originalID
to originalID -> has_phenotype -> originalID
with underlying support graph(s) originalID -> has_subclass -> descendantID -> has_phenotype -> originalID
This should be possible to catch at support graph creation time -- I'm double-checking for potential knock on effects of catching it there.
Note: there also needs to be tracking of which QNodes subclassing should be applied to, otherwise subclassing created for one QNode (ie. n0
) could be applied to a more specific node bindings for a more specific QNode (ie. n1
)
Example Query
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids":["MONDO:0006497"]
},
"n1": {
"categories": ["biolink:DiseaseOrPhenotypicFeature"],
"ids": ["MONDO:0016215"]
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1",
"predicates": ["biolink:has_phenotype"]
}
}
}
}
}
Result (subclassing changes n1
from MONDO:0016215
to MONDO:0006497
even though n1
is explicitly set to MONDO:0016215
)
"results": [
{
"node_bindings": {
"n0": [
{
"id": "MONDO:0016215",
"attributes": []
}
],
"n1": [
{
"id": "MONDO:0006497",
"attributes": []
}
]
},
"analyses": [
{
"resource_id": "infores:biothings-explorer",
"edge_bindings": {
"e0": [
{
"id": "MONDO:0016215-has_phenotype-MONDO:0006497-via_subclass",
"attributes": []
}
]
},
"score": 0.7308028298005093
}
]
}
]
},
@rjawesome @tokebe: See my recent post where I wonder if the "duplicate second path" in the Feedback issue is actually coming from Aragorn https://github.com/NCATSTranslator/Feedback/issues/907#issuecomment-2292892862
If that is the case, then the self-edge doesn't seem to be showing in the UI. Which would mean this issue of self-edges is less urgent/breaking.
There is a "true duplicate paths in the UI" situation going on, but that's happening because of how the ARS/UI handles our edges with subclassing support-graphs vs ones without https://github.com/NCATSTranslator/Feedback/issues/776#issuecomment-2292842664. This does make me wonder if Translator will ask us to undo our subclassing support-graph wrapping...
FYI Another example with a self-edge (CI, ran earlier today): https://arax.ci.transltr.io/?r=3c00776f-a6aa-473e-907f-3dec1367a566
![Screen Shot 2024-08-15 at 10 46 29 PM](https://github.com/user-attachments/assets/7e14f71d-7925-4989-aaf5-320c49238d95)
But I didn't notice any duplicate paths issues in the UI view. There were similar paths (highlighted in the screenshot below), but I think they were from Aragorn, based on my analysis of the Test-instance version here.
This is why I started to suspect that the self-edge wasn't the source of the duplicate paths in 907 either...
![Screenshot 2024-08-15 at 23-39-25 Results - NCATS Biomedical Data Translator](https://github.com/user-attachments/assets/ac5b8454-6e5b-464c-a14f-afeb038400f7)
I'm pretty sure regardless of ARS/UI-level merging errors, BTE is still creating self-edges via subclassing that we don't want? @colleenXu unless there's a reason to keep these self-edges, @rjawesome don't pause any active work on this issue.
It's fine if there's simultaneous effort outside of this issue toward problems that are related to https://github.com/NCATSTranslator/Feedback/issues/907 but aren't related to this issue -- we can separately track a BTE-level problem here and assignees to the Feedback issue can decide whether our efforts are related/causal to 907.
I tested the two linked queries and things looked good: the paths containing self-edges have been removed. I did see a different in scores/rank, often a slight decrease for the results that previously had self-edges.
But in the maturity-onset diabetes case below, the rank drastically dropped, but the score only decreased a little.
Before ([ARAX-CI viewing](https://arax.ci.transltr.io/?r=1d5ed830-1455-40d8-b0e1-31e7851ef547), used BTE-test) ![Screen Shot 2024-08-20 at 8 51 55 PM](https://github.com/user-attachments/assets/852b4f1b-e95d-41a8-a252-1af760b89049) After: [cerebral-palsy-issue-850-testing.json.zip](https://github.com/user-attachments/files/16685769/cerebral-palsy-issue-850-testing.json.zip) ![Screen Shot 2024-08-20 at 8 52 49 PM](https://github.com/user-attachments/assets/c2afbc5d-4745-4904-afff-ad788a4994e6)
Before ([ARAX-CI viewing](https://arax.ci.transltr.io/?r=3c00776f-a6aa-473e-907f-3dec1367a566)) ![Screen Shot 2024-08-20 at 8 57 11 PM](https://github.com/user-attachments/assets/1206503a-235e-405b-9f31-82e22faa5490) After: [MoDY-issue-850.json.zip](https://github.com/user-attachments/files/16685787/MoDY-issue-850.json.zip) ![Screen Shot 2024-08-20 at 8 57 48 PM](https://github.com/user-attachments/assets/d4dd96ae-21bf-4b2f-818c-12c2d4a2d9d4)
I also did a quick test of a MVP2 query and didn't notice anything off about its response (increases ADRB2 - NCBIGene:154).
This issue is also tracking a fix to "edge-case query failures due to an error in subclassing".
Info from https://github.com/biothings/biothings_explorer/issues/789#issuecomment-2302855308
Context: the error revealed a bug in node-expansion, coming from MONDO ontology having non-MONDO ID info - including mismatched ID namespaces for parent-child pairs. HP has this issue too.
The first test query used an UBERON ID, which had some child info in those files with a different ID namespace.
The solution is to remove those mismatched pairs.
Jackson has put a fix into the
subclassing-fix
branches (deployment tracked in https://github.com/biothings/biothings_explorer/issues/850)
Relevant changes deployed to Prod.
Example: https://github.com/NCATSTranslator/Feedback/issues/907
Something is wrong in node-expansion and subclass handling, likely a missing check somewhere that should be causing edges to be dropped.