NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

One Hop Inferred with the same predicate as the lookup path #775

Closed sstemann closed 3 months ago

sstemann commented 4 months ago

I can't remember, is this allowed? its tagged as Curated, so i have trouble understand how a one hop path with curated publications is inferrred.

image

image

https://ui.test.transltr.io/main/results?l=Maturity-onset%20Diabetes%20Of%20The%20Young&i=MONDO:0018911&t=0&r=0&q=45260776-7bff-41c0-a8cb-132ba7ba5974

sstemann commented 4 months ago

feels wrapped up with #751 and #706 and #742

sierra-moxon commented 3 months ago

we definitely have one-hope path inference with different predicates by design (e.g. "in clinical trials for" lookup edge leading to a "treats" inference edge).

we also have this kind of one-hop inference. the evidence here is just "SPOKE" (@suihuang-ISB).

Screen Shot 2024-05-31 at 5 52 54 PM

so I changed the title to reflect that the inference in this ticket uses the same predicate as the lookup support graph

sierra-moxon commented 3 months ago

from relay:

Genomewide commented 3 months ago

The curies for Glimiperide:

PUBCHEM.COMPOUND:3476 CHEMBL.COMPOUND:CHEMBL1481 CHEBI:5383

sierra-moxon commented 3 months ago

from relay: @colleenXu thinks we might not be rendering the subclass edge from BTE correctly for the MODY1 -> MODY Glimipiride result

sierra-moxon commented 3 months ago

from relay: some concern about not merging two inferred paths for treats (because one has KL predicted one has KL unknown in the infores catalog). now, UI should use edge metadata (but the fallback should be infores catalog)

infores catalog should get an update to not-unknown for predicated treats edge from BTE Ui should use latest infores Ui should use edge properties as a first step (if not provided, sorry :D), then use infores catalog.

gaurav commented 3 months ago

It looks like the three Glimiperides are currently in two cliques (even with drug_chemical_conflate turned on), but that both NodeNorm Prod and NodeNorm Test agree that PUBCHEM.COMPOUND:3476/CHEBI:92609 is a different clique from CHEMBL.COMPOUND:CHEMBL1481/CHEBI:5383. I think the only thing that's changed is the preferred ID of those two cliques.

On NodeNorm Prod: https://nodenorm.transltr.io/1.4/get_normalized_nodes?curie=PUBCHEM.COMPOUND%3A3476&curie=CHEMBL.COMPOUND%3ACHEMBL1481&curie=CHEBI%3A5383&conflate=true&drug_chemical_conflate=true&description=false

On NodeNorm Test: https://nodenorm.test.transltr.io/1.4/get_normalized_nodes?curie=PUBCHEM.COMPOUND%3A3476&curie=CHEMBL.COMPOUND%3ACHEMBL1481&curie=CHEBI%3A5383&conflate=true&drug_chemical_conflate=true&description=false

colleenXu commented 3 months ago

I think this is the source of the "lookup" edge from BTE - and it's not actually a lookup. This is why I think something happened upstream to cut off the full path.

ARAX UI link to BTE's response, this is in a support-graph for result 365: https://arax.ncats.io/?r=3bbc8cab-7da5-450b-851c-5032b7420a54

Screen Shot 2024-06-04 at 4 47 37 PM
gprice1129 commented 3 months ago

We implemented picking up knowledge levels directly from edges with an infores-catalog fallback locally and it fixes the issue of not combining the BTE inferred edge and support paths with the other inferred edges.

However even after doing that the issue of the one hop curated edge is still showing up in the results. After some investigating we think we've found the issue.

As @colleenXu pointed out correctly the endpoint of the subgraph that includes the curated edge should end at the superclass of maturity-onset diabetes of the young 1. The issue occurs because of the specific details of how the UI determines what qualifies as an endpoint in a graph.

In the case of ARAGORN, the result node bindings that come back that includes this subgraph in one of its analyses look like this:

          "node_bindings": {
            "sn": [
              {
                "id": "CHEMBL.COMPOUND:CHEMBL1481",
                "query_id": null,
                "attributes": []
              }
            ],
            "on": [
              {
                "id": "MONDO:0018911",
                "query_id": null,
                "attributes": []
              },
              {
                "id": "MONDO:0010894",
                "query_id": "MONDO:0018911",
                "attributes": []
              },
              {
                "id": "MONDO:0007452",
                "query_id": "MONDO:0018911",
                "attributes": []
              }
            ]
          },

The issue occurs because the subclass MONDO:0007452 occurs as a node binding under one of the endpoint labels (sn and on). Without going into the details, the UI treats all nodes listed under both labels as a possible endpoint when generating the paths.

It seems like there are two obvious solutions we could implement for this:

  1. The ARAGORN result node bindings does include the query_id attribute which the UI could use to determine what is an actual endpoint.
  2. ARAs do not include the node bindings from support graphs under these labels.

@cbizon I think we should have a quick discussion about this to figure out what we should do here.

gprice1129 commented 3 months ago

@cbizon and I discussed. For the short term at least the UI will implement option 1 above.

sstemann commented 3 months ago

will this be included in Octopus to prod next week?

gprice1129 commented 3 months ago

@sstemann yes. We implemented it and have tested it locally. Going to push to CI today.

gprice1129 commented 3 months ago

Tested and confirmed in TEST. Closing.