no publications, uninformative source in EPC

gglusman commented 6 months ago

What drugs may treat PRP, 2024/1/10 edition.

Result 3/960 is Etanercept (score 4.99). Evidence cites 0 publications, 0 clinical trials, and 2 sources. These are AEOLUS and BTE, each linking to their respective wiki pages. These links don't help get more evidence for the assertion. Result 6/960 is Tretinoin (score 4.96). Evidence: 0 pubs, 0 CTs, 1 source - BTE. Again, an EPC dead end. Same for result 12/960, Fluconazole (score 4.92), etc.

gglusman commented 5 months ago

Likely related to #657.

gglusman commented 5 months ago

https://ui.test.transltr.io/main/results?l=Aggressive+Systemic+Mastocytosis&i=MONDO%3A0020333&t=0&r=f58a3afe&q=246f5c17-fd36-4980-92fe-e8cf105e53fc: top 52 results have 0 publications, 0 clinical trials, and 1 source (Unsecret Agent) supporting.

cbizon commented 5 months ago

BTE is showing up as a primary knowledge source on a treats edge for Etanercept

cbizon commented 5 months ago

It's not totally clear whether these are inferred edges missing their support graphs or lookup edges with a misassigned primary knowledge source, but in either case, the new UI in test will make this more straightforward to diagnose.

colleenXu commented 5 months ago

EDIT: Looks like a UI issue?

Regarding PRP + Etanercept: see BTE's response in ARAX-UI for this PK, result 2

Right now, I'm only seeing 1 source in the UI link provided (aeolus), which corresponds with 1 BTE edge
the "BTE as primary source" is probably coming from BTE's other edge, which is an inferred edge w/ a medium-size support-graph. But the UI isn't showing any of that info now. Is that expected? @Genomewide

PRP + Tretinoin / Fluconazole: see BTE's response in ARAX-UI for this PK: Tretinoin is result 4, Fluconazole is result 10

now I don't see either result in the provided UI link
the "BTE as primary source" is probably coming from BTE's edges: for both results, there's just 1 inferred edge w/ a medium-size support-graph

Note for Aggressive Systemic Mastocytosis and Unsecret Agent: I suspect it's the same problem as BTE's. Now the results are probably missing, and the ARAX-UI shows that Unsecret agent has inferred edges with support-graphs.

andrewsu commented 5 months ago

agree with @colleenXu's analysis -- I think BTE is reporting what it knows appropriately.

cbizon commented 5 months ago

@gprice1129 do you agree that this is a UI issue?

Genomewide commented 5 months ago

@colleenXu @gprice1129 @andrewsu I think I see what the technical problem is. I don't think it is a UI problem.

I also don't think fixing it will really addresse @gglusman issue? I think the real issue here is the AEOLUS website for the primary source is unhelpful in finding out why that edge is there. I think this is related to the infores work that hopefully is getting better. Is that right, Gwênlyn?

I think I found the technical issue though! It looks like BTE may have missing edges. (Edit - I explain more below - it may be in the merge in ARS?)

The edge IDs suggest BTE meant this to be a 1-hop inferred treats? With one edge as the aux_graph. And Aragorn and ARAX meant there to be a 1-hop lookup edge. All pointing to aeolus as the knowledge source. Which makes this an odd case, but allowed and we would support it. If all 3 had just been a lookup then the display would definitely be correct. What is missing is the 'inferred' part.

The lack of support graph data is what breaks it according to Gus.

I see the following for BTE:

BTE analysis:

                            {
                                "score": 0.9504392119646858,
                                "attributes": null,
                                "resource_id": "infores:biothings-explorer",
                                "edge_bindings": {
                                    "t_edge": [
                                        {
                                            "id": "b4eb32ffb57766c71724794168601b13",
                                            "attributes": null
                                        },
                                        {
                                            "id": "inferred-UNII:OP401G7OJC-treats-MONDO:0100017",
                                            "attributes": null
                                        }
                                    ]
                                },
                                "scoring_method": null,
                                "support_graphs": null
                            }

Edge 1

                        "inferred-UNII:OP401G7OJC-treats-MONDO:0100017": {
                            "object": "MONDO:0100017",
                            "sources": [
                                {
                                    "resource_id": "infores:biothings-explorer",
                                    "resource_role": "primary_knowledge_source",
                                    "upstream_resource_ids": []
                                }
                            ],
                            "subject": "UNII:OP401G7OJC",
                            "predicate": "biolink:treats",
                            "attributes": [
                                {
                                    "value": [
                                        "inferred-UNII:OP401G7OJC-treats-MONDO:0100017-support0"
                                    ],
                                    "attribute_type_id": "biolink:support_graphs"
                                }
                            ]
                        },

I don't see any edges for the graph "inferred-UNII:OP401G7OJC-treats-MONDO:0100017-support0". So, there would not be any paths shown under the infered for this. It would just be displayed like an infered edge with BTE as the knowledge source. Or it may break it, I am not sure.

Edge 2

                        "b4eb32ffb57766c71724794168601b13": {
                            "object": "MONDO:0100017",
                            "sources": [
                                {
                                    "resource_id": "infores:aeolus",
                                    "resource_role": "primary_knowledge_source",
                                    "upstream_resource_ids": []
                                },
                                {
                                    "resource_id": "infores:mychem-info",
                                    "resource_role": "aggregator_knowledge_source",
                                    "upstream_resource_ids": [
                                        "infores:aeolus"
                                    ]
                                },
                                {
                                    "resource_id": "infores:biothings-explorer",
                                    "resource_role": "aggregator_knowledge_source",
                                    "upstream_resource_ids": [
                                        "infores:mychem-info"
                                    ]
                                }
                            ],
                            "subject": "UNII:OP401G7OJC",
                            "predicate": "biolink:treats",
                            "attributes": []
                        },

This would show a lookup.

cbizon commented 5 months ago

I'm not sure I understand @Genomewide - the ARAGORN example uses the attribute "biolink:support_graphs" on the inferred edge, just as BTE does. That, as I understand it, is the right attribute name in both cases. And both point to auxiallary graphs that have edges in them.

The only thing I can't verify for sure from this comment is whether the BTE aux graph "inferred-UNII:OP401G7OJC-treats-MONDO:0100017-support0" has the right edge in it.

But if it does, then I'm not clear on why the UI will show the aux graph for ARAGORN's and not BTE's result.

Also, I'm not clear on what your last comment there is referring to:

Also, should this be caught in ARAX and show a warning or something?

Maybe I'm missing the point here...

Genomewide commented 5 months ago

@cbizon You are right, I had to back that out of what I put above. I edited to remove it, but you may still see the old answer. I think BTE is just missing support graphs.

Genomewide commented 5 months ago

Here is the weird kicker! And I think Gus just figured it out! ARAX displays this for the BTE result.

I only look at the merged JSON and not the individual ones. I bet it is getting cut in the merge. According to the data I see inferred-PUBCHEM.COMPOUND:3365-treats-MONDO:0100017-support0 has no edges. It is just referenced like the other answer above.

@MarkDWilliams can you check this? Here is the ref link again.
https://arax.ci.transltr.io/?r=cfcdc63b-f49f-4ebd-bda1-c2510bd353f1

cbizon commented 5 months ago

Oh ok, thanks @Genomewide !

Genomewide commented 5 months ago

I am so sorry you read all of that! I did not want to leave it in bc it was confusing so I rewrote history a bit, but you were too on top of it and got the incorrect and (what I hope are) correct parts.

gprice1129 commented 5 months ago

Just to be extra clear why the BTE edges are missing. The current UI code treats the entire analysis as invalid if it can't find any of the referenced nodes, edges, or support graphs in the analysis no matter how many levels deep in the support graphs the missing reference occurs.

I also verified that in the raw message from BTE the support graphs referenced in the missing edges on the UI do appear in the auxiliary_graph field. So this is definitely being removed somewhere in the ARS merge @MarkDWilliams.

MarkDWilliams commented 5 months ago

Taking a look at this now to see what the root cause of the issue is.

MarkDWilliams commented 5 months ago

Shervin was able to dig into these results (thanks @ShervinAbd92 !) and I believe she found the issue. Reposting her comments here as she's AFK for a bit.

In the removed_block function the aux_graph ” inferred-UNII:OP401G7OJC-treats-MONDO:0100017-support0" is added to the aux_graph_to_remove list since there is an overlap between “a8095addc72a5c9785059bda32cd940f and “MONDO:0100017-has_phenotype-MONDO:0005070-via_subclass”--> which is in the list of edges_to_remove, which has a “object” that is among nodes_to_remove list from the block list.

So, it looks like

the Aux graph is slated to be removed because it contained an edge to be removed (and no other edges. If it had other "legitimate" edges, it would only have the blocked edges removed but the aux graph as a whole would remain.)
The edge is on the list to remove because it contains a blocked node
The blocked node in question is MONDO:0005070 : "Tumor"

We have a few options here as I see it, and I'm happy to facilitate whatever folks want to see.

Software is working as intended. No fix needed
"Tumor" is not generic enough to warrant inclusion in the Blocklist and should be removed (which would bring back the aux graph and edge)
"Tumor" is a valid blocklist entry, but the behavior for blocking should be handled differently somehow (I'm open to options here)

For 3, I believe the behavior that we're seeing here is in-line with what got discussed on the TAQA breakout for how the blocking should work, but I'm happy to change that if, seeing it in action, we have different feelings about it. I'll lay out the broad strokes of what we have implemented below for clarity :

First, we find any overlap between the blocklist and the knowledge_graph nodes. That is, which, if any, blocked nodes actually occur in this set of results.
Once identified, we remove these nodes from the knowledge_graph and any edges which contain them as a subject or object.
We look at the auxiliary_graphs, and if it contains edges that we removed, we remove the whole aux graph. The thinking here was that aux graphs were often interconnected, and removing just one (or some subset of the total) edge would leave a graph that didn't make sense.
Then, we look at results. Results that have a blocked node as part of their node_bindings on the actual result object just get removed entirely. That is if a result says "Water treats Diabetes", we remove the whole result
If a result is ok at the top-level, we move on to looking in the analyses.
If there are any edge_bindings in the analysis that are among our bad edges, we remove just those.
However, if all the edges in an analysis are bad, we remove the whole analysis
If the support_graph in the analysis (which is basically a list of edge_bindings) has any of our edges to be removed, we remove those from the support_graph as well.
Finally, if in the process of removing "bad" analyses, we find ourself with a result that now has no analyses (i.e. a result that only had "bad" evidence supporting it), we remove that results because we don't want to show something with zero evidence.

Apologies for the long post with a series of lists, but I just wanted to make sure everything was as clear as possible. Does anyone have any thoughts on which option we should pursue?

cbizon commented 5 months ago

This mostly makes sense, but if I understand it all, then the removal of Tumor should eventually have led to the remove of the result, but it didn't. Is that wrong?

I guess I also wonder about whether Tumor should be on the block list. For instance something like Chemical reduces Tumors and therefore treats Cancer X seems like a valid path?

Are we mixing up different uses for the blocklist i.e. is Tumor on there for another good reason that I'm not thinking of?

colleenXu commented 5 months ago

So it looks like most of the discussion is about BTE's PRP disease and Etanercept drug result.

Here's the screenshot for that (Andy's post shows a different result - Fluconazole drug):

Screen Shot 2024-01-26 at 12 11 05 PM

And the aux-graph inferred-UNII:OP401G7OJC-treats-MONDO:0100017-support0 edges: Screen Shot 2024-01-26 at 12 24 58 PM

Mark said:

the Aux graph is slated to be removed because it contained an edge to be removed (and no other edges. If it had other "legitimate" edges, it would only have the blocked edges removed but the aux graph as a whole would remain.)

But it looks like this aux-graph has many edges that don't involve tumor MONDO:0005070. So shouldn't the support graph have been kept - just with the tumor-edges + tumor-node removed from the aux-graph/knowledge-graph?

MONDO:0005070 = neoplasm here, not tumor?
this is the case for the fluconazole result as well - there's more edges in the aux-graph inferred-PUBCHEM.COMPOUND:3365-treats-MONDO:0100017-support0 than just the ones for tumor/neoplasm/MONDO:0005070

MarkDWilliams commented 5 months ago

The initial thinking with this logic was that removing edges from aux graphs would leave aux graphs that were disconnected or didn't make sense. So, we remove the whole aux_graph. If folks want this behavior changed, we could remove just the tumor edge and leave the rest. It might just leave us with some funky aux graphs in the future.

MarkDWilliams commented 5 months ago

@cbizon It would only "trickle up" to remove the whole result if removing this aux graph left us with a result that had no supporting evidence.

Genomewide commented 5 months ago

@MarkDWilliams Is there a time when this would leave us with an inferred edge that has no aux graphs?

Also, would be good to have @gprice1129 look at the explanation and see when he thinks our system would just boot the result. The reason this one still showed up was because others reported it. It would disappear if not.

gprice1129 commented 5 months ago

A couple things I want to clarify:

@MarkDWilliams I think it should be considered a bug that there is a hanging reference to an auxiliary graph (or anything, nodes, edges, etc.) that has been removed from the response by the ARS. I would like to hear your thoughts.
@Genomewide I don't think there is a situation where removing the node/edges from a graph would break the UI. At worst what should happen is that the analysis is thrown out if there are dangling references (see point 1).

My main point is that if the ARS is removing anything, it needs to systematically remove it everywhere.

MarkDWilliams commented 5 months ago

Agree. Dangling references are a bug on the ARS side and should be fixed. Also agree with the overall principle that if you're removing something, you should remove all references to it.

sierra-moxon commented 4 months ago

from TAQA: this is an ARS issue in progress

sierra-moxon commented 1 month ago

Noting that Etanercept has a much lower score now, in the 2's vs ~5.

No "Tretinoin" but two with that in the name, one has a score of 5. It seems reasonable to me after taking a look at the publications. But also noting that it says it has 9 publications when really it only has 2.

NCATSTranslator / Feedback

no publications, uninformative source in EPC #680