Filter out known "avoid / do not use" drugs from results

khanspers commented 1 year ago

For some diseases, there are known "do not use / avoid" compounds/drugs. Can these be filtered out of results somehow? Maybe by using a "stop" list or by cross-checking against "has_adverse_event" for the same disease?

Example: NARP Syndrome has no cure and treatment is supportive to treat symptoms like seizures, headache, acidosis, dystonia etc. Known agents to avoid when treating these symptoms include sodium valproate, barbiturates, anesthesia, dichloroacetate. Two of these agents are included in Translator results, Valproic Acid (synonym of sodium valproate) and barbiturates:

https://ui.ci.transltr.io/results?l=Narp%20Syndrome&t=0&q=eb77dc97-42ad-40e8-8627-e03ad9c148dc https://arax.ncats.io/?r=eb77dc97-42ad-40e8-8627-e03ad9c148dc

gglusman commented 1 year ago

Perhaps it might be more useful to segregate such results and clearly label them as contraindicated, as opposed to filtering them out? Such information might be very valuable for anyone considering what could (or could not) treat a disease.

sierra-moxon commented 1 year ago

We can take a look at the mychem.info return and see if something like this kind of attribute exists and can be implemented in the UI.

sierra-moxon commented 1 year ago

Tested on CI this morning: https://ui.ci.transltr.io/results?l=Narp%20Syndrome&i=MONDO:0010794&t=0&q=e62a0a12-a253-48ed-9d2d-5a9b71b2f78f

Still very much returning (with a score of 100 and many publications) the chemicals that are to be avoided when treating NARP Syndrome:

sierra-moxon commented 1 year ago

@newgene @Genomewide - is there a parameter in the mychem.info endpoint that can help us exclude "Valporic Acid" from the results for "what chemical may treat NARP Syndrome"?

newgene commented 1 year ago

@sierra-moxon to categorize a drug as "void to use", does that have to go through manual curation? If so, then we will have to put up such a "stop" list. Not sure if any particular attributes can be reliably used to differentiate these two drugs, but Mychem.info/annotator does have the adverse events information:

https://biothings.ncats.io/annotator/PUBCHEM.COMPOUND:3121?fields=drugcentral.fda_adverse_event

or all available fields we have for "Valproic Acid":

https://biothings.ncats.io/annotator/PUBCHEM.COMPOUND:3121?fields=all

Genomewide commented 1 year ago

My hope is, when scoring is incorporated with real scoring, that this will be diminished. I think the actual O and O score may help this, but I can not be certain. I do think that it will be different than what is there today, but could still show up. It is very hard to know how the new scoring will change this. I do think that the initial score will promote the results that are known treatments because two things are supposed to happen.

ARAs are supposed to return ALL the known => single hop 'treats' edges. And then give their limit of creative results.
If a result is returned by multiple ARAs, then it gets a score boost of some kind.

If I am right, then the real problem with valproic acid is that it is the top score answer. I believe that will change with the new scoring. Fingers crossed, but there is not a great way to check until that is implemented. So, I am not sure how to handle this until then.

sierra-moxon commented 1 year ago

@Rosinaweber @sharatisrani - maybe you can use this issue to test @Genomewide's hypothesis that scoring and ordering will help at least push this record to the bottom of the list? Your feedback here is most welcome.

Rosinaweber commented 1 year ago

Based on my comprehension of ARAs and O&O, contra-indications should be covered by qualifiers. I have a different perspective from @Genomewide. ARAs are not supposed to send responses based on sources without reasoning about them. I suggest bringing ARA champions like @dkoslicki @cbizon @suihuang-ISB @webyrd @andrewsu and ask whether they have a way to score down these results or something else they may suggest. The role of O&O is not to use domain knowledge to bring down the order of a result but to order them based on what users would expect to find and consider aspects such as ARA confidence, clinical evidence, and novelty.

sharatisrani commented 1 year ago

There are two approaches to this. One is ARA reasoning (or even KP below, if a one-hop) to filter out contraindicated. The other is at the O&O level, just like other actions are being taken (eg drug conflation), to arrest these results and demote/block them. The right way feels like removing them as low as possible. Then, the O&O can put an arrest/failsafe mechanism if a contraindication table is callable, though unsure about September.

I also agree with @Genomewide that we will learn more about this issue at test stage.

sierra-moxon commented 1 year ago

from TAQA:

piecemeal addition of these as issues; but many of the upregulates/downregulates are systemic (can get rid of these in chunks).
the data that would exclude the adverse events is also very messy/noisy.

from Sharat: A related issue to #147 might be https://github.com/NCATSTranslator/ui-fe/issues/154. Meaning, solving 147 might solve 154 too

sandrine-m commented 1 year ago

Re: adverse events suggestion upon suggestion from @cbizon , MolePro is exposing knowledge of adverse event data from SIDER. Question araised during TAQA meeting about data value of such dataset (aka how noisy it is?). Here is what I could find about data value for this dataset (from litterature):

not benchmarked extensively (source: papercode)
large knowledge base, NLP-based (source: sider paper)
"the obtained lists [from SIDER] typically mix rare and common ADEs alongside those caused by misuse or abuse of drugs [..] determining what constitutes a “true” effect of a drug may be unclear, and from a functional perspective – may depend on the context in which this question is asked, as well as the drug use patterns and prevalence of the adverse effect. For example, in comparing SIDER and OFFSIDES, Cheng et al.7 found relatively little overlap between them (7,741 common pairs, out of 418,532 OFFSIDES pairs and 120,236 SIDER pairs) suggesting that it is common for ADEs described in package inserts not to be reported to FAERS, and vice versa." (source: Medrxiv 2021 paper)

Conclusions on data value:

SIDER will suffer NLP biases for complex contexts (dosage, frequency, clinical context... etc.)
SIDER has poor overlap with FAERS data
SIDER provides data only on marketed medicines
current benchmarks are mainly measuring overlap/reproducibility between NLP extracted knowledgebases or/and predicted ADE from algorithms (no gold standard curated list - though the Medrxiv paper could be interesting to look at when/if published).

suihuang-ISB commented 1 year ago

There is a confusion between CONTRAINDICATION ("Drug-compound C is contraindicated in Disease D") versus ADVERSE EVENT ( "DrugCompound C causes Adverse Event=Disease/Symptom D") . Both connect a DrugCompound C to a Disease/Symptom D with some sort of a negative connotation.....

Thus, we have two types of edges (between DrugCompound C and a Disease/Symptom D (both are contained in SPOKE KP): (i) C--[is contradicted in]-->D (ii) C--[has adverse effect]-->D

Now in daily practice in medicine, both relationships come from LOOK-UP. and they are well documented and available in prescription handbooks.... So that is the minimal bar we need to beat. Can you infer CONTRAINDICATION edges? yes - to some extent: These two edges are related - in the following way:

If C1 has side-effect D1, then C1 is contra-indicated in a patient with comorbidity D1 (being treated for something else. This is common practice... Thus, a drug with side effect "hypotension" should not be given to patient with low blood pressure... Such "inferred contraindication" is made by the treating physician and may or may not be listed explicitly under CONTRAINDICATION....

The question is: Should the Translator creatively infer such CONTRAINDICATION?? I don't think so. Now, at a more sophisticated level, someone may suspect, based on mechanisms that a drug may be contraindicated in some conditions. That is a different story and would be represent true creative mode - but we do not have an MVP yet for the use case of CONTRAINDICATION. We are only trying to AVOID it.

Finally: Sider is NOT a good source for adverse effects. It just collects the "side effect" list on the packaging information in all the countries in which a drug is being sold. In fact, adverse effects are generally loosely defined and are (out of fear) overreported.... - way too much on the safe side. FDA adheres to the international definition "...reasonable possibility [of causality] cannot be ruled out". No statistical significance is needed as rigorous as for the claim of a therapeutic effect !

The simplest thing for us could be to check a explicit CONTRAINDICATON database, and filter them out of the "WHAT TREATS .." query answers.

sandrine-m commented 1 year ago

If I understood well the suggestion of action points is: (1) For clinical KPs (Milestone Fall) : review Translator knowledge in the compound--to--disease space for _contraindicatedfor predicate. Do we need more data? (2) For ARAs (Milestone Fall because not is the critical paths) : remove response node that have a _contraindicatedfor predicate from their responses Do you all agree on this?

One thing I am missing is : what is the link with MVP2 here ?

sandrine-m commented 1 year ago

Maybe biolink:contraindicated_for predicate (see biolink hierarchy for predicates here) could be used as a single hop to filter those results out. At this point the easiest fix might be to implement a filter at the ARA output level perhaps? I assigned Chris B. as I believe this issue belongs to "opposite of what I asked for" and I still assigned to UI as they are implementing a few filters just so that everyone concerned stays in the loop.

sandrine-m commented 1 year ago

your comment cross linked with this issue made me dig up a little bit, here is what I found: (0) biolink modeling level: 'biolink:contraindicatedfor' is present suggesting we have data about it. (1) KP level: As Andrew is mentioning and given that we have the biolink model to support, theoretically we should have the data. I have also checked that NARP has a MONDO term, which is the case. I ran a ["contraindicatedfor NARP" query](https://arax.ci.transltr.io/?r=f5b546ec-dba5-415e-9aa6-86c01837b264) through ARAX which led to no results (although MolePro has 16 nodes -no edges, BTE returns no results):

{
   "edges": {
      "e00": {
         "subject":   "n01",
         "object":    "n00",
         "predicates": ["biolink:contraindicated_for"]
      }
   },
   "nodes": {
      "n00": {
         "ids":        ["MONDO:0010794"]
      },
      "n01": {
         "categories":  ["biolink:ChemicalEntity"]
      }
   }
}

Using the tree visualization of Biolink model I selected higher up classes to query a broader scope to assess whether we have any data on contraindication:

{
   "edges": {
      "e00": {
         "subject":   "n01",
         "object":    "n00",
         "predicates": ["biolink:contraindicated_for"]
      }
   },
   "nodes": {
      "n00": {
         "categories":        ["biolink:DiseaseOrPhenotypicFeature"]
      },
      "n01": {
         "categories":  ["biolink:ChemicalEntity"]
      }
   }
}

Translator has knowledge of biolink:contraindicated_for (in general, but not NARP apparently) only through improving agent (see results here

I dug further and looked at the provenance of improving agent info and found indeed DrugCentral as the only primary source.

My understading is that DrugCentral get adverse events from FAERS (FDA) that are, as Sui mentions different from contraindicated_for. Note that FAERS reports reactions and not disease so DrugCentral ingests and interprets diseases from reactions which might lead to incomplete mapping.

To complete the digging (and I dug a lot on that one due to issues in name mapping),betahistine and nelarabine are reported in Drugcentral as an adverse event for NARP which Translator does not seem to be aware of. This suggests either an outdated ressource. I retested today.

So this issue dicussion + the slack discussion IMO suggests a stop gap measure issuewith several layers of priority: (1) shorter term post relay:

biolink: It seems from a quick search that DrugCentral reports FDA adverse events and not contraindication (which are modeled as contraindication). Drugcentral is reporting adverse events from FDA FAERS which is the primary source. It would be good to make sure Sui's comment are adressed in the biolink model, mainly whether we have data on contraindications on the form of attributes.
ARA: make sure that ARA take into account biolink:contraindicated_for in reasoning to filter from their creative responses compounds that might lead to the opposite of what we ask for. (2) longer term:
KP: we need data on contraindication , update Drugcentral adverse events data (or figure out why missing data), ingest adverse events -ARA: incorporation of "adverse events for" in reasoning to filter from their creative responses compounds that might lead to the opposite of what we ask for.

suihuang-ISB commented 1 year ago

Thanks @Sandrine. First, please, as indicated above, let's not mix up HAS_ADVERSE_EVENT ("Side effect" in some KPs) with CONTRAINDCATED _FOR. Thus, lets skip FAERS for now, here. But the former is commonly used to predict the latter, as done in clinical practice - see below. The latter, CONTRAINDICATION is thus not so much regarded as an inherent property of a drug, than a situation-dependent, care decision influencing factor. Hence, not many databases have CONTRAINDICATION as an attribute. The information is mostly from textbooks, and Drug Prescription Labels, thus more a regulatory than scientific issue. DrugCentral has it manually curated from these sources - and that is indeed where SPOKE gets it from, to add a CONTRAINDICATION edge to its ChEMBL-derived compound nodes. By contrast Drugbank has such prescription information behind a paywall.

For Translator I see to two ways out: [ (1) is what you suggest... ] Given a query "What treats disease X": (1) FILTERING: As you said, encourage KPs to provide the CONTRINDICATION EDGE that points from a drug to disease/phenotype X, and eliminate accordingly. (2) "CREATIVE MODE FILTERING" by actually inferring a CONTRAINDICATION - thus mimicking the care provider's thought process: If the drug "D" (returned) has one of the following edges, "biolink: has_adverse_event" or "biolink: has_side_effect" or "biolink: predisposes" or "biolink: exacerbates" to the query term X, or to any of its children terms ("biolink: phenotypic feature", "biolink: clinical finding", etc), then eliminate D from results.

sandrine-m commented 1 year ago

After meeting at the DM call today; those type of data gap/reasoning issues need more work and will be fixed post September

gprice1129 commented 11 months ago

Is this still a feature request for the UI? We are not clear on what is being asked for if it is.

sandrine-m commented 11 months ago

I'd like to reroute this issue as a test case for a benchmark

cbizon commented 11 months ago

Maybe it would be good to make a label for that?

sandrine-muller-research commented 11 months ago

on top of the "opposite of what I asked for" label, I created the contraintications label with associated description. I will maintain separately on a GSheet all metadata related to each issue instances of that label

sierra-moxon commented 4 months ago

retested the original ticket and can not reproduce; calling this fixed! :)

NCATSTranslator / Feedback

Filter out known "avoid / do not use" drugs from results #147