NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

what may treat "semicircular canal dehiscence syndrome" instead returns what may _cause_ "semicircular canal dehiscence syndrome" #9

Closed TranslatorIssueCreator closed 1 year ago

TranslatorIssueCreator commented 1 year ago

Type: Bug Report

URL: http://transltr-bma-ui-dev.ncats.io/results?loading=true

ARS PK: 41a62364-63f7-4b86-885e-6a75935b7901

Steps to reproduce:

Search for "semicircular canal dehiscence syndrome " Click on one result, read contradictory statement in abstract

Screenshots:

kevinschaper commented 1 year ago

The abstract says:

One female patient found to have inflammatory pseudotumor of the temporal bone. After treatment with mastoidectomy and steroids, she subsequently developed superior semicircular canal dehiscence syndrome.

It feels like whatever the source of this edge is, it might be good to adjust the predicate assignment knobs to lean less specific.

At the UI level, if there is a way to know that it's a lower confidence result (based on provenance / evidence codes?), it might be good to actually give something more like "zero results, but click to see 1 result from automated text mining" or something along those lines to avoid eroding user confidence.

sierra-moxon commented 1 year ago
Screen Shot 2022-09-29 at 10 44 17 AM
sierra-moxon commented 1 year ago

We think this goes to the DM team for discussion. key discussion notes: SEMMED definition of treats is not the same as Biolink definition of treats Should we use less specific predicates for SEMMED? Should we rank SEMMED lower in the results than other sources (an EPC knowledge category issue) Do the ARA rankings get taken into account? Is this ranked lower in the ARA than in the UI?

sierra-moxon commented 1 year ago

From the DM call:

kevinschaper commented 1 year ago

We ended up titling this issue as a directionality problem, but after re-reading the sentence, I don't think that it's asserting any meaningful connection between SCDS and steroids. The patient had a surgery and was treated with steroids, it feels like a big stretch to say that it was steroids that produced the extra skull cavity opening connecting to the inner ear rather than the surgery.

andrewsu commented 1 year ago

this is no longer being returned in the UI, possibly because of a BTE update

I just reran the query and confirmed that the steroids - treats - SCDS edge is still being returned by the semmeddb api (https://biothings.ncats.io/semmeddb/query?q=object.umls:C3275929%20AND%20subject.umls:C0038317), and the corresponding result is still being returned by the UI: http://transltr-bma-ui-dev.ncats.io/results?q=9c3d5cbd-222a-4e74-adf6-bbcb43682d36. (It looked like there might have been a brief delay in the UI rendering when it showed zero results, but then it was subsequently updated.)

after re-reading the sentence, I don't think that it's asserting any meaningful connection between SCDS and steroids

I don't claim any relevant domain expertise here, but in my reading of the abstract being cited by semmeddb (https://pubmed.ncbi.nlm.nih.gov/34149029/), I think the text mining isn't horrible here (at least we've certainly seen worse). I believe it incorrectly used the "treats" predicate, but I think it did capture a relationship that the authors wanted to suggest was a real-world possibility. ("After treatment with mastoidectomy and steroids, [the patient] subsequently developed superior semicircular canal dehiscence syndrome.")

Regardless of the quality of the NLP, I still think the UI is right to show this result, but it should have a clear visual indicator that all the supporting evidence is from text-mined resources.

sierra-moxon commented 1 year ago

@andrewsu - when you say that it incorrectly assigns the treats predicate, does that conflict with the UI showing it as an answer to the "may treat" question?

sierra-moxon commented 1 year ago

related: there are two autocomplete terms for Semicircular Canal Dehiscence Syndrome returned in the UI: if you select the first one, no results are returned, if you select the second one, we see the steroid result as before. This may explain the difference that @andrewsu and I found. Documented in another ticket: https://github.com/NCATSTranslator/Feedback/issues/51

andrewsu commented 1 year ago

I think in an ideal world, the NLP would be perfect, the semmeddb edge would have used a predicate other than "treats", and then the result would not show up in the UI. (Unless an ARA added it as a creative mode result, in which case it would show up.) But if we accept that NLP isn't perfect but decide we're going to use NLP resources anyway, then I think the UI is right to show all the results, rather than trying to add any clever logic to try to improve what the NLP resources have done. IMHO...

cbizon commented 1 year ago

I see where you're coming from @andrewsu but I'm also worried that this version of NLP may not be good enough if it is frequently leading to bad results. My suspicion is that we are going to end up with some special casing for things somewhere. Google started with page-rank but ended up building all sorts of stuff around that core (I don't even know if it's the core any more).

My own 2 cents is that we should at least consider what we might do to mitigate crummy answers coming from (specifically) semmedb. A non-exclusive, incomplete list of possibilities:

I think that all of these are more-or-less reasonable, and the question comes down to what kinds of type1 vs type2 tradeoffs they imply.

andrewsu commented 1 year ago

Great point, Chris. I was going on the assumption that we'd made the decision to include semmed even with all of its warts, largely based on my impression that the UAB PMI team found it useful (and I think they have the most real-world experience using translator for "treats" predictions). But if I'm wrong or that perception has changed, I'm certainly not wedded to the idea that semmeddb needs to be in there. (But if it is in there, I think the UI should use a prominent visual indicator for text-mining-only edges.)

cbizon commented 1 year ago

So what do you think is the right way to move forward on what we want to do here? I seem to recall that we wanted to do a comparison between semmed treats and TMKP treats, but maybe I just made that up?

andrewsu commented 1 year ago

So what do you think is the right way to move forward on what we want to do here?

Recognizing there are several reasonable paths forward, my personal view would be to defer to the UI / PMI folks. If their collective gut says the results based on semmeddb edges aren't valuable, then I think we should figure out a way to exclude them. If they want to keep them, then I think it should be a high priority for the UI to create a visual indicator when an edge is based only on text-mined resources. (Maybe a dashed line for the edge?) I also think the ongoing O&O work will greatly help if it pushes these results very far down in score (since I agree with David's comment that the current ranking based only on total # of paths is far from ideal).

I seem to recall that we wanted to do a comparison between semmed treats and TMKP treats, but maybe I just made that up?

I don't recall this idea, but seems reasonable. And while we're thinking of general strategies for reduction in semmed crumminess, I still think that the idea you proposed to compare semmeddb's NER to pubtator (https://github.com/biothings/BioThings_Explorer_TRAPI/issues/501) could be very useful (acknowledging of course that it wouldn't have helped in the case described in this issue, which is not NER-related).

cbizon commented 1 year ago

It sounds like TMKP is already doing some comparisons like this? Is that right @mikebada ?

sierra-moxon commented 1 year ago

@bill-baumgartner - do you know if we have some statistics that we can use to help us decide where to swap SEMMED results for TMKP results?

sierra-moxon commented 1 year ago

From TAQA: TMKP has not done direct comparisons to SEMMED, but they do have a flag that they add when their results also match something that SEMMED has found. (next funding period might bring this as a random sampling - compare precision vs. recall). TMKP doesn't have the breadth of predicates - but for treats, this analysis would be helpful.

From DM: we can curate a list of these, and see how appropriate the lists are from SEMMED and TMKP.

Jeff H from Will's group is also looking into this.

Can we use a different predicate for SEMMED? (SEMMED shouldn't use "treats" - treat treats as special) "creative" mode queries using text mining results are good; the paths and reasoning need to be shown so we can see why these would be returned (e.g. cyclic vomiting - difficult to find unless you know the answer). This one isn't a hypothesis, it is the opposite. There are pros and cons here.

sierra-moxon commented 1 year ago

authoritative sources do exist for treats. (e.g. Pharos)

mikebada commented 1 year ago

@cbizon I believe this was mentioned in one of the recent weekly meetings, but just to restate in this thread, one of our explicit tasks in this new Translator funding year is to evaluate TMKP vs SEMMED results, and we have already begun discussing options for this.

sierra-moxon commented 1 year ago

From TAQA w/re to the SEMMEDDB hackathon summary given there: We can clean up some of the SEMMEDDB results, but not all of them. hesitate to go down the rabbit hole of fixing an unmaintained resource. Two more things can be tried: Andrew has tickets (link to UMLS, look at scores provided by SEMMEDDB). From Bill B and Andrew S: in general, we agree that Text Mined data should be treated differently, with the goal of increasing user trust. Is this a plan for the UI? From Andy: if UI gets 3 PMIDs from 3 sources - one is DrugBank, 3 are SEMMEDDB - do we treat those differently? In discussion. From Sui and Tyler: papers are evidence, but not the only kind. - numbers of pubs aren't the only factor to provide evidence. Andrew: the key evidence here is SEMMEDDB has paper references and so does DrugBank. We don't have the knowledge level to disambiguate the difference between the confidence of these two sources. If the evidence is only SEMMEDDB -- then maybe a visual queue - like a dotted line, would be helpful.

Please see notes from the TAQA meeting for more discussion here. There is nothing more specific to this issue, but the issue is a use case for the more broad discussion of how to show users SEMMEDDB is being used.

andrewsu commented 1 year ago

I'm going to suggest closing this issue. This specific example would be touched by several planned, completed and/or ongoing activities:

In short, I think this example/issue has served its purpose and can be closed (but will leave it to someone else to second my proposal and actually close it)...

kevinschaper commented 1 year ago

I agree with @andrewsu