biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
10 stars 11 forks source link

BTE creating self-edges via subclassing #850

Closed tokebe closed 2 months ago

tokebe commented 2 months ago

Example: https://github.com/NCATSTranslator/Feedback/issues/907

Something is wrong in node-expansion and subclass handling, likely a missing check somewhere that should be causing edges to be dropped.

tokebe commented 2 months ago

Looks like this is being caused by the following process:

This should be possible to catch at support graph creation time -- I'm double-checking for potential knock on effects of catching it there.

rjawesome commented 2 months ago

Note: there also needs to be tracking of which QNodes subclassing should be applied to, otherwise subclassing created for one QNode (ie. n0) could be applied to a more specific node bindings for a more specific QNode (ie. n1) Example Query

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids":["MONDO:0006497"]
                },
                "n1": {
                    "categories": ["biolink:DiseaseOrPhenotypicFeature"],
                    "ids": ["MONDO:0016215"]
               }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:has_phenotype"]
                }
            }
        }
    }
}

Result (subclassing changes n1 from MONDO:0016215 to MONDO:0006497 even though n1 is explicitly set to MONDO:0016215)

"results": [
      {
        "node_bindings": {
          "n0": [
            {
              "id": "MONDO:0016215",
              "attributes": []
            }
          ],
          "n1": [
            {
              "id": "MONDO:0006497",
              "attributes": []
            }
          ]
        },
        "analyses": [
          {
            "resource_id": "infores:biothings-explorer",
            "edge_bindings": {
              "e0": [
                {
                  "id": "MONDO:0016215-has_phenotype-MONDO:0006497-via_subclass",
                  "attributes": []
                }
              ]
            },
            "score": 0.7308028298005093
          }
        ]
      }
    ]
  },
colleenXu commented 2 months ago

HOLD ON

@rjawesome @tokebe: See my recent post where I wonder if the "duplicate second path" in the Feedback issue is actually coming from Aragorn https://github.com/NCATSTranslator/Feedback/issues/907#issuecomment-2292892862

If that is the case, then the self-edge doesn't seem to be showing in the UI. Which would mean this issue of self-edges is less urgent/breaking.


There is a "true duplicate paths in the UI" situation going on, but that's happening because of how the ARS/UI handles our edges with subclassing support-graphs vs ones without https://github.com/NCATSTranslator/Feedback/issues/776#issuecomment-2292842664. This does make me wonder if Translator will ask us to undo our subclassing support-graph wrapping...


FYI Another example with a self-edge (CI, ran earlier today): https://arax.ci.transltr.io/?r=3c00776f-a6aa-473e-907f-3dec1367a566

Screenshot from ARAX-CI view

![Screen Shot 2024-08-15 at 10 46 29 PM](https://github.com/user-attachments/assets/7e14f71d-7925-4989-aaf5-320c49238d95)

But I didn't notice any duplicate paths issues in the UI view. There were similar paths (highlighted in the screenshot below), but I think they were from Aragorn, based on my analysis of the Test-instance version here.

This is why I started to suspect that the self-edge wasn't the source of the duplicate paths in 907 either...

Screenshot of UI-CI view

![Screenshot 2024-08-15 at 23-39-25 Results - NCATS Biomedical Data Translator](https://github.com/user-attachments/assets/ac5b8454-6e5b-464c-a14f-afeb038400f7)

tokebe commented 2 months ago

I'm pretty sure regardless of ARS/UI-level merging errors, BTE is still creating self-edges via subclassing that we don't want? @colleenXu unless there's a reason to keep these self-edges, @rjawesome don't pause any active work on this issue.

It's fine if there's simultaneous effort outside of this issue toward problems that are related to https://github.com/NCATSTranslator/Feedback/issues/907 but aren't related to this issue -- we can separately track a BTE-level problem here and assignees to the Feedback issue can decide whether our efforts are related/causal to 907.

colleenXu commented 2 months ago

I tested the two linked queries and things looked good: the paths containing self-edges have been removed. I did see a different in scores/rank, often a slight decrease for the results that previously had self-edges.

But in the maturity-onset diabetes case below, the rank drastically dropped, but the score only decreased a little.

cerebral palsy (MONDO:0006497)

Before ([ARAX-CI viewing](https://arax.ci.transltr.io/?r=1d5ed830-1455-40d8-b0e1-31e7851ef547), used BTE-test) ![Screen Shot 2024-08-20 at 8 51 55 PM](https://github.com/user-attachments/assets/852b4f1b-e95d-41a8-a252-1af760b89049) After: [cerebral-palsy-issue-850-testing.json.zip](https://github.com/user-attachments/files/16685769/cerebral-palsy-issue-850-testing.json.zip) ![Screen Shot 2024-08-20 at 8 52 49 PM](https://github.com/user-attachments/assets/c2afbc5d-4745-4904-afff-ad788a4994e6)

maturity-onset diabetes of the young (MONDO:0018911)

Before ([ARAX-CI viewing](https://arax.ci.transltr.io/?r=3c00776f-a6aa-473e-907f-3dec1367a566)) ![Screen Shot 2024-08-20 at 8 57 11 PM](https://github.com/user-attachments/assets/1206503a-235e-405b-9f31-82e22faa5490) After: [MoDY-issue-850.json.zip](https://github.com/user-attachments/files/16685787/MoDY-issue-850.json.zip) ![Screen Shot 2024-08-20 at 8 57 48 PM](https://github.com/user-attachments/assets/d4dd96ae-21bf-4b2f-818c-12c2d4a2d9d4)


I also did a quick test of a MVP2 query and didn't notice anything off about its response (increases ADRB2 - NCBIGene:154).

colleenXu commented 2 months ago

This issue is also tracking a fix to "edge-case query failures due to an error in subclassing".

Info from https://github.com/biothings/biothings_explorer/issues/789#issuecomment-2302855308

Context: the error revealed a bug in node-expansion, coming from MONDO ontology having non-MONDO ID info - including mismatched ID namespaces for parent-child pairs. HP has this issue too.

The first test query used an UBERON ID, which had some child info in those files with a different ID namespace.

The solution is to remove those mismatched pairs.

Jackson has put a fix into the subclassing-fix branches (deployment tracked in https://github.com/biothings/biothings_explorer/issues/850)

tokebe commented 2 months ago

Relevant changes deployed to Prod.