NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

subgraphs of subgraphs look like duplicate paths - why do we represent the same paths multiple ways? #907

Open sstemann opened 1 month ago

sstemann commented 1 month ago

You can try to use the screenshots below but it really is something to experience for yourself

https://ui.test.transltr.io/main/results?l=Cerebral%20Palsy&i=MONDO:0006497&t=0&r=0&q=1d5ed830-1455-40d8-b0e1-31e7851ef547

Expand Adrenal Cortex Hormones image

Notice the first inferred path - two edges, two different predicates, one ARA image image

Notice the support path of the second inferred path LOOKS duplicative of the second from the bottom support path of the first inferred path image

The Evidence on the edges looks the same -

Do we need all of these paths?

gprice1129 commented 4 weeks ago

I can confirm that BTE places the inferred edge @sstemann is talking about both at the analysis level and as a supporting edge for the other inferred edge. It definitely seems redundant. @andrewsu @colleenXu

colleenXu commented 4 weeks ago

I think there's an issue with self-edges happening here.

I'm looking at the BTE response in ARAX-CI UI:

On the top-level, there's two edges with subgraphs. (FYI The bottom edge's "treats_or_applied..." predicate is a known problem that is now fixed in BTE CI)

Screen Shot 2024-08-15 at 2 29 12 AM

The bottom edge's support-graph has spastic diplegia, which matches the top red line in Sarah's screenshot.

Screen Shot 2024-08-15 at 2 31 21 AM

The top edge's support-graph at first glance doesn't have spastic diplegia at all.

Screen Shot 2024-08-15 at 2 33 13 AM

But if I look at the "phenotype_of" self-edge, it has a bunch of support-graphs. I bet the "duplicate/repetitiveness" is coming from that. The ARAX-UI doesn't show these deeper support-graphs so I didn't verify this yet. @tokebe I thought we dealt with self-edges back in https://github.com/biothings/biothings_explorer/issues/734?

Screen Shot 2024-08-15 at 2 37 11 AM

tokebe commented 4 weeks ago

Tracking in https://github.com/biothings/biothings_explorer/issues/850. I thought we had dealt with this already as well. I'll investigate further.

colleenXu commented 3 weeks ago

HOLD ON: I now wonder if the bottom "spastic diplegia" path could actually be coming from Aragorn. AKA "duplicate" spastic diplegia paths in the UI isn't actually due to the BTE's data having self-edges. @cbizon @sstemann @gprice1129

I can't see Aragorn's response in the ARAX-CI UI, but I can look in the JSON.

Adrenal Cortex Hormones (MESH:D000305) is actually the 5th result in Aragorn's response. The "on" ID is actually for spastic diplegia (MONDO:0001167), with the query_id set to cerebral palsy (MONDO:0006497)

``` { "node_bindings": { "sn": [ { "id": "MESH:D000305", "attributes": [] } ], "on": [ { "id": "MONDO:0001167", "query_id": "MONDO:0006497", "attributes": [] } ] }, "analyses": [ { "resource_id": "infores:aragorn", "edge_bindings": { "t_edge": [ { "id": "ef6edd43b0c8", "attributes": [] } ] }, "score": 0.99796887135904 } ], "normalized_score": 99.2 } ```

The result's bound-edge `ef6edd43b0c8` is adrenal cortex hormones "treats_or_..." spastic diplegia from semmeddb <- rtx-kg2, with 12 publications

``` { "subject": "MESH:D000305", "object": "MONDO:0001167", "predicate": "biolink:treats", "sources": [ { "resource_id": "infores:aragorn", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids": [ "infores:rtx-kg2" ] }, { "resource_id": "infores:rtx-kg2", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids": [ "infores:semmeddb" ] }, { "resource_id": "infores:semmeddb", "resource_role": "primary_knowledge_source", "upstream_resource_ids": [] } ], "attributes": [ { "attribute_type_id": "bts:sentence", "value": { "PMID:10749462": { "object score": "1000", "publication date": "2000 Mar", "sentence": "BACKGROUND: Although treatment with oral corticosteroids can cause reactivation of latent Mycobacterium tuberculosis (TB) infection in purified protein derivative (PPD)-positive individuals with no evidence of clinical disease, little is known about the effects of inhaled corticosteroids in this respect.", "subject score": "888" }, "PMID:11277277": { "object score": "1000", "publication date": "2001 Mar", "sentence": "Treatment with systemic corticosteroids is known to increase the risk of fractures but little is known of the fracture risks associated with inhaled corticosteroids.", "subject score": "888" }, "PMID:17518870": { "object score": "1000", "publication date": "2007 Mar", "sentence": "Previous research has been mainly quantitative and analysed variables associated with compliance, doing little to increase professional understanding of the patient's perspective on taking corticosteroid treatment.", "subject score": "790" }, "PMID:19153534": { "object score": "1000", "publication date": "2009 Mar", "sentence": "BACKGROUND: The addition of leukotriene modifier (LM) may be a useful approach for uncontrollable asthma despite treatment with inhaled corticosteroid (ICS), especially in asthmatics comorbid with allergic rhinitis (AR), although little is known about its molecular mechanism.", "subject score": "888" }, "PMID:19260540": { "object score": "694", "publication date": "2009 Feb", "sentence": "Although the nodular opacities and GGO improved after an administration of corticosteroid (PSL 0.5 mg/kg/day), little improvement in the consolidations and cyst formation was demonstrated.", "subject score": "1000" }, "PMID:21340722": { "object score": "1000", "publication date": "2011 Sep", "sentence": "Little is known about SEL in patients with hematologic malignancies who require frequent lumbar punctures and corticosteroid treatment that places them at risk.", "subject score": "888" }, "PMID:22441634": { "object score": "1000", "publication date": "2012 Jun", "sentence": "BACKGROUND: Little is known about the safety and effectiveness of early interventional treatment (EIT) with intranasal corticosteroids for seasonal allergic rhinitis.", "subject score": "861" }, "PMID:28988846": { "object score": "694", "publication date": "2018 Jun", "sentence": "She was first misdiagnosed as having Bell's palsy and received corticosteroids which resulted in little improvement.", "subject score": "888" }, "PMID:29163502": { "object score": "1000", "publication date": "2017", "sentence": "Tolerogenic DCs (tDCs) are commonly generated using corticosteroids including dexamethasone, however, to date, little is known on how corticosteroid treatment alters glycosylation and what functional consequences this may have.", "subject score": "790" }, "PMID:29507032": { "object score": "694", "publication date": "2018 Mar 05", "sentence": "High-dose corticosteroids and methotrexate were given with little improvement, maintaining disabling dysphagia leading to a percutaneous endoscopic gastrostomy tube placement.", "subject score": "901" }, "PMID:30914016": { "object score": "1000", "publication date": "2019", "sentence": "RESULTS: Little has been done to optimise the dose and formulation of antenatal corticosteroid treatment since the first clinical trial in 1972.", "subject score": "851" }, "PMID:8232485": { "object score": "1000", "publication date": "1993 Dec 09", "sentence": "BACKGROUND: Optic neuritis is often the first clinical manifestation of multiple sclerosis, but little is known about the effect of corticosteroid treatment for optic neuritis on the subsequent risk of multiple sclerosis.", "subject score": "888" } }, "attribute_source": "infores:semmeddb" }, { "attribute_type_id": "biolink:original_predicate", "value": [ "UMLS:C0001617---SEMMEDDB:treats---None---None---None---UMLS:C0023882---SEMMEDDB:" ], "value_type_id": "metatype:String", "description": "The IDs of the original RTX-KG2pre edge(s) corresponding to this edge prior to any synonymization or remapping." }, { "attribute_type_id": "biolink:publications", "value": [ "PMID:29507032", "PMID:19260540", "PMID:10749462", "PMID:8232485", "PMID:30914016", "PMID:29163502", "PMID:11277277", "PMID:21340722", "PMID:28988846", "PMID:17518870", "PMID:19153534", "PMID:22441634" ], "value_type_id": "biolink:Uriorcurie", "attribute_source": "infores:semmeddb" } ] } ```

It seems odd that the 5th result from Aragorn wouldn't be in the ARS/UI - especially if BTE's corresponding result (# 78) is.

Links: UI-Test ARAX-CI viewing Aragorn JSON response, downloaded from ARS

colleenXu commented 3 weeks ago

As for the first part of @sstemann's opening post: there's both an "inferred" treats and an "inferred" treats_or_applied... because of BTE's behavior.

This is now addressed in BTE-CI - it'll just be "inferred" treats (fix deployed Aug 14th ~9:20 AM pacific time).

gprice1129 commented 2 weeks ago

@colleenXu have you verified that the issue is coming from ARAGORN? I don't think that is the case because it is not included as a source on either inferred edge.

colleenXu commented 2 weeks ago

@gprice1129

Well there's been no reply here from Aragorn folks (or ARS folks) on the situation I proposed. In that situation, I think the provenance info was lost somehow.

I'm reminded of https://github.com/NCATSTranslator/Feedback/issues/776#issuecomment-2292795430, although that was something different.

I suspect it isn't BTE self-edges because I saw other cases of BTE self-edges here and didn't see the same "duplicate paths" problem in the UI.


I imagine it'd be worth it to rerun this query and see how things look. A "self-edge" fix is live in BTE-CI as of yesterday/BTE-Test as of today...in case it is from BTE.

tokebe commented 1 week ago

Note that the BTE self-edges issue has been resolved and deployed to Prod, tangential as it is to this issue. Additionally, BTE direct edge handling has been fixed on Prod (re: this comment)