Open kvnthomas98 opened 1 year ago
Slotting this on the agenda for tomorrow's AHM. Good topic for group discussion.
I am intrigued with the "interleave" idea. While xDTD is awesome, I have concerns about a modification such that creative mode results are always (and only) at the top; I can expound on that tomorrow at the meeting.
Thoughts from the AHM:
Different possible approaches include:
n
from the former and m
from the latter where n+m
< 500 (or whatever the specified cutoff is)N
lookups where N
is small (eg. 25). A nuanced approach would be to see what's causing the explosion of lookup results, and downrank them in the ranker.Eg. of a bazillion lookups: https://arax.ncats.io/?r=174764 Any common(ish) disease will do
@dkoslicki As a test (and since there was a TRAPI query for it in #2187) I ran "what drugs treat multiple sclerosis" through ARAX (the arax.ncats.io/beta
endpoint) using knowledge_type="inferred"
. I got 500 results. The first 50 results look pretty reasonable, with a minority of experimental/investigational treatments in there (vitamin D, epigallocatechin, cannabidiol, estriol, ibudilast, melatonin, biotin, etc.). Below the first 50, we start to get some really broad categories like "Antibodies" or "Interferons" or "Vaccines", or "Vitamins" or "immunomodulators". We also start to get some puzzling results like "ethylene glycol" (which may reflect text-mining getting confused by text about PEGylation of some other therapeutic agent). Below the first 150 results, we do start to see increased frequency of crazy stuff like "caffeine", "fish oils", "ketamine", "nicotine", "tadalafil", and so forth.
I think there are four things driving such a large number of lookup results:
I think our scores are, overall, a bit too high for the drugs that are not indicated for MS (e.g., the investigational treatments). For the drugs that are indicated for MS, the scores are fine.
Our scores are way too high for the overly general stuff like "Vaccines" and stuff like that. Ideally, those should be either filtered out or have their score reduced due to the concepts' generality. I know we've talked about this a lot, I guess I'm just echoing the feeling here that it would be good if we weren't seeing "antibodies" and "vaccines" and "vitamins" in the results.
So in conclusion, I concur, there really aren't 500 different treatments for M.S. But there are probably at least 60-70 that are used to manage M.S. (remember it's a complex multi-faceted disease for which there is AFAIK no cure), plus another 100 to 150 being actively investigated.
@saramsey do we have a KP or edge property that we can use explicitly for "indicated for"? IIRC, when we ask for treats
edges, KP's don't distinguish between investigational and indicated for. Perhaps there's something in KG2 we could use to cross check?
@dkoslicki I am not sure. It is a problem that the biolink:treats
is being used for investigational/experimental therapies like vitamin D. I think the Biolink people and the Predicates WG people are working on "refactoring" the biolink:treats
predicate to allow more precise statements for such cases.
In the meantime, I like the idea of trying to pull in that information from somewhere. I am not sure about where we could get it, though. I guess if someone were to go through all 500 results and label them as "indicated", "investigational", and "neither" (this would take an afternoon though!), we could try to find which sources are contributing to the "indicated" vs. "investigational". I suspect there will be a bias towards certain sources.
https://www.fda.gov/drugs/drug-approvals-and-databases/drugsfda-data-files perhaps? Doesn't cover biologics and like though
while I'm not aware of something like indicated_for
edges that capture which drugs are FDA approved to treat which conditions (that seems very useful), we do have the ability to constrain queries on FDA approval status (#1599, which makes use of KG2 data (#1497))... it wouldn't let us filter down the result set to drugs approved specifically for MS, but maybe it would at least get rid of general terms and drugs not yet approved for anything?
Ah, I wasn't aware of that. Should be a good first pass, so @kvnthomas98 please do make note of Amy's comment once you start working on this.
Thank you @amykglen, good suggestion
Related: #2327
Currently Lookup Results may dominate the results. If we have a creative query, we need to ensure that creative results don't get filtered out.
Suggestions proposed by @dkoslicki: i) Manually place creative results on top. ii) Interleave creative results between lookup results.