NCIOCPL / drug-dictionary-api

NCI Drug Dictionary API
0 stars 3 forks source link

Contains returns for autocomplete includes "extra" characters in between parts that match. #43

Closed blairlearn closed 3 years ago

blairlearn commented 3 years ago

Issue description

Autocomplete queries for drug terms containing a string return "unusual" matches. For example, searching for drug terms containing the "anti-c" should only terms with strings matching on "anti-c" when it starts on a word boundary. An expected result would be Term 802323: "allogeneic anti-CD20 CAR T cells LUCAR-20S"

Instead, there are also matches with inexact matches: e.g. Term 764075: :anti c-KIT antibody-drug conjugate LOP628" (matches on "anti c" with a space instead of a dash) Term 794677: "anti-EGFR monoclonal antibody CPGJ 602" which has the text "EGFR monoclonal antibody " between the "anti" and the "c".

Term 764075 might be expected because of ignoring punctuation, but 794677 doesn't fit expectations.

Similar unexpected results occur when searching for drug alias records.

ESTIMATE TBD

Steps to reproduce the issue

  1. Do a contains autosuggest query for drug terms containing the string "anti-c"

https://webapis.cancer.gov/drugdictionary/v1/Autosuggest?searchText=anti-c&matchType=contains&size=20&includeResourceTypes=DrugTerm

What's the expected result?

What's the actual result?

Additional details / screenshot

Related Tickets

blairlearn commented 3 years ago

Test cases:

NLM generic drug name stems

Top 50 drug search terms Search Term Searches
nivolumab 718
pembrolizumab 669
bevacizumab 532
atezolizumab 511
durvalumab 444
all 427
trastuzumab 411
avelumab 383
ipilimumab 364
olaparib 353
cetuximab 340
trametinib 334
palbociclib 313
paclitaxel 293
cabozantinib 291
cyclophosphamide 274
alpelisib 272
carboplatin 267
cisplatin 263
abemaciclib 261
selumetinib 259
venetoclax 256
regorafenib 256
ramucirumab 245
sorafenib 243
tremelimumab 241
rituximab 238
everolimus 236
cobimetinib 235
lenvatinib 234
gemcitabine 231
capecitabine 229
niraparib 227
doxorubicin 226
sunitinib 225
lenalidomide 221
bortezomib 214
folfox 213
binimetinib 211
crizotinib 211
daratumumab 203
docetaxel 203
lapatinib 202
irinotecan 198
selinexor 196
afatinib 195
pdr001 194
erlotinib 191
ibrutinib 187
entrectinib 185
erdafitinib 185