NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

Adapt reasoning to deal with transitive relations #468

Open gglusman opened 12 months ago

gglusman commented 12 months ago

Upon re-re-testing what drugs may treat Familial Pityriasis Rubra Pilaris (on ci), and comparing to the previous test, I see a new result with score 100:

image

Multiple issues with this result:

In 1953, Sidney Q. Cohlan (1953) observed that high doses of vitamin A had teratogenic effects on pregnant rats. G. L. Peck was initially researching vitamin A for the treatment of Darier’s disease, ichthyosis, and pityriasis rubra pilaris and discovered its therapeutic effects on acne vulgaris. Peck et al. (1979) also counseled on contraception, [...]

This clearly does not provide confident support for the 'retinol treats PRP' assertion on which the 9-cis-Retinal one depends. Yet it gets score 100. The direct 'treats' path cites Aragorn as source. The 'retinol treats PRP' triple cites BTE and TMKP.

sandrine-m commented 11 months ago

Thank you @gglusman for pointing this out. Those are excellent points in my opinion. Here is my analysis of your ticket, please tell me if I understood properly (I would split this ticket accordingly) and feel free to correct any misunderstanding :) :

  1. scoring: I am unclear on the scoring piece as my understanding is that the scoring is still in progress.
  2. reasoning: more rules are needed when is input of is present
  3. EPC: add provenance on edges is input of and has input
  4. SEMMED-related: filter text-mined or SEMMED edges treats if supported by only 1 publication
gglusman commented 11 months ago

@sandrine-m I'm thinking about this a bit differently. The main points:

  1. Text mining: a 1979 paper mentions in passing that someone was initially researching using X for the treatment of Y, and this is interpreted to mean that X treats Y. This is not necessarily a semmed filtering issue. Of note, this is significantly weaker evidence of X actually treating Y than if there is a clinical trial testing the effectiveness of X in the treatment of Y... and we were told we cannot derive 'X treats Y' from the presence of such a clinical trial.

  2. Reasoning: it's not just the 'input of' that is suspect in reasoning. There are several predicates that are hardly transitive in nature but are being interpreted as such. Some intuitive examples:

    • drug_A binds albumin, drug_B binds albumin, drug_B treats disease_X ... can we hypothesize that drug_A treats disease_X too? Tons of stuff binds albumin, it's just not specific!
    • key_A is_in key_ring, key_ring has_key key_B, key_B opens my_house ... it's rather weak to derive that key_A opens my_house (or even that it 'may open')
    • handkerchief is_in my_pocket, my_pocket has_content key_B, key_B opens my_house ... perhaps the handkerchief will open my_house too?

Yet somehow these two extremely weak things (a text misparsing and a weak inference path) become a top-scoring result. Agreed on the missing EPC bit and the not-clear-how-scoring-works. :)

sandrine-m commented 11 months ago

Thank you! I'll split the ticket accordingly to make sure we treat all points.

sandrine-m commented 11 months ago

I am keeping this original ticket for the reasoning part as I find the discussion in this thread important to keep for this topic. Changing title accordingly.

gglusman commented 11 months ago

Another example from the ASM case:

image

A number of genes increase secretion of dinoprostone and their corresponding things that are claimed to treat ASM. Guilt by association can only go so far...

gglusman commented 11 months ago

In #484 Sarah exposed another example of this, with with two drugs being 'part of Animals'... and somehow this being sufficient support to transfer the 'treats' assertion from one to the other.

sandrine-m commented 11 months ago

Thanks @gglusman for associating both issues, that is very helpful for triage. I have created a "transitive" sublabel for all these issues to help go back to them post relay.

gglusman commented 11 months ago

Noting #465 as affected by weak transitive logic.

mbrush commented 10 months ago

@gglusman do you happen to know what ARA prediction tools reported these problematic support paths? I suspect that is not a manually crafted or automated rule from an ARA like BTE, Unsecret, or ARAGORN, but more likely an arbitrary path through a larger explanation graph like those created by the ARAX reasoner - which are really not meant to be considered independently, outside of the context of this larger graph,

If it is indeed the case that these types of paths are from ARAX - then we will have perhaps isolated the source of the problem, and be better able to find a solution. This would also be interesting to know in upcoming EPC planning discussions about how to weight the strength of evidence provided by support paths with different topologies.

gglusman commented 10 months ago

@mbrush We collected a number of issues with the transitive label, and more than one ARA seems to be implicated.

gglusman commented 6 months ago

Testing on 2024/01/19, what drugs may treat Alzheimer Disease?

One answer is "Iron, dietary". The explaining path is: image

This would be a correct redrawing of the graph:

image

That is, SLC11A2 causes decreased uptake of both dietary iron and zinc; zinc is claimed to treat AD. There is no support for iron->AD.

sierra-moxon commented 5 months ago

from TAQA: