NCIOCPL / drug-dictionary-app

NCI Drug Dictionary Application
2 stars 0 forks source link

Search result is not returned for some terms with Contains search #70

Closed alinai closed 3 years ago

alinai commented 3 years ago

Issue description

Description of the issue On performing search using "len" and then selecting "anti-CD22 scFv TCRz:41BB-CAR lentiviral vector-transduced autologous T lymphocytes" from the autosuggest using a contains search on react-dev, when we select and submit this term, we do not get any search result.

ESTIMATE TBD

Steps to reproduce the issue

  1. Select the "Contains" radio button on https://react-app-dev.cancer.gov/drug-dictionary-app/pr-68
  2. Type "len" and then select the term "anti-CD22 scFv TCRz:41BB-CAR lentiviral vector-transduced autologous T lymphocytes" from the autosuggest drop-down
  3. Click the Search button.
  4. No search result is returned.

What's the expected result?

-

What's the actual result?

-

Additional details / screenshot

image

Related Tickets

blairlearn commented 3 years ago

@alinai There are steps missing. At what point in the sequence do you select the "Contains" button?

Do not answer here. Please update the ticket.

alinai commented 3 years ago

The ticket has been updated!

kate-mashkina commented 3 years ago

Digging more into this issue: Example where contains does not return anything 1.Select contains

  1. type 'can' and select 'allogeneic large multivalent immunogen breast cancer vaccine' (this term does not have any special characters
  2. hit search and see 'No results found'

Now, if switched back to 'starts with' it will spit out result, which is an exact term. Prod returns any term that is selected from 'contains' autosuggested list and so does the Glossary (note for Glossary API returns a result, where as in drug it comes empty)

Example where contains returns one result 1.Select contains

  1. type 'glo' and select 'horse anti-thymocyte globulin'
  2. hit search and see the term is returned
blairlearn commented 3 years ago

This occurs because the ES mapping document imposes a limit of 30 characters on the ngram_filter which is used for contains searches.

Solution 1: Increase the ngram_max value to accommodate longer names. The current maximum term name size is 685 characters. Setting ngram_filter to allow that would increase the size of the index from 71 MB to 510, so that's not a good solution.

Solution two: Adjust the ngram_filter to use a larger max_gram value. (Tentatively, 100 characters.) If the user enters a string longer than that, do a search based on the first X number of characters. Ideally, this would only apply to contains search as begins is not affected by this issue. (This value would need to be configurable, but with a default so we don't have to remember to configure it for the deployment.)

Solution three: Adjust the elasticsearch to return any exact matches which would match the autosuggest's query. This would fix the issue for the specific case of a term selected from autosuggest, but would fail for anything longer than the max_gram value which was not an exact match.

After discussion with @blairlearn, @VictoriaSunNIH, @zhuomingao, @blilianyu, and @mworrest, the decision was to go with solution #2.