NCIOCPL / glossary-api

API for Dictionary of Cancer Terms, Dictionary of Genetics Terms, and other Glossary documents.
0 stars 5 forks source link

Autosuggest\Search endpoints using contains do not return results for some searches #118

Closed seyilonge-nci closed 4 years ago

seyilonge-nci commented 4 years ago

Issue description

Autosuggest and Search endpoints using contains does not return results when search string contains spaces

ESTIMATE

Steps to reproduce the issue

  1. Make a call to https://webapis-dev.cancer.gov/glossary/v1/Autosuggest/Cancer.gov/Patient/en/node%20biopsy?matchType=Contains&size=1000 or https://webapis-dev.cancer.gov/glossary/v1/Terms/search/Cancer.gov/Patient/en/node%20biopsy?matchType=Contains&size=1000

What's the expected result?

Should return results matching search text

What's the actual result?

{
  "meta": {
    "totalResults": 0,
    "from": 0
  },
  "results": [],
  "links": null
}

Additional details / screenshot

Result from swagger

Results showing same search conducted on Cancer.gov

bryanpizzillo commented 4 years ago

@zhuomingao - Can you triage this please? Make sure the index will handle this, then check the query.

zhuomingao commented 4 years ago

after changes to index mapping and loader, search contains now works as prod.

for autosuggest contain, ES query needs to be:

curl -XPOST http://SERVER_NAME/glossaryv1/terms/_search -H 'Content-Type: application/x-ndjson'   -d '{
  "query": {
    "bool" : {
      "must" : 
        [{"term" : { "language" : "es" }},
        {"term" : { "dictionary" : "Cancer.gov" }},
        {"term": { "audience": "Patient"}},
        {"match_phrase":{"term_name._autocomplete":"cutáneo"}}
        ],
         "must_not" :  {"prefix" : {"term_name" : "cutáneo"}}
      }
    }
,"sort": ["term_name"]
, "_source": ["term_id", "term_name"]
, "from": 0
, "size": 10
}'

as in https://github.com/NCIOCPL/glossary-api/wiki/Elastic-search-Query

blairlearn commented 4 years ago

We're getting closer. "node biopsy" works, but searching for "cel" doesn't match production. In production, autosuggest for terms containing "cell" matches both "adoptive cell transfer" and "adult T-cell leukemia/lymphoma".

The API does not match the latter.

Sample results for current system:

"acquired pure red cell aplasia"
"adoptive cell therapy"
"adoptive cell transfer"
"adult T-cell leukemia/lymphoma"
"allogeneic stem cell transplant"
"anaplastic large cell lymphoma"

Sample results for new system:

"acquired pure red cell aplasia"
"adoptive cell therapy"
"adoptive cell transfer"
"allogeneic stem cell transplantation"
"anaplastic large cell lymphoma"