Closed KateMashkinaNIH closed 4 years ago
@KateMashkinaNIH - Please verify that this was addressed by PR #30.
per @zhuomingao (via Slack)
we need to change the drug index mapping to use classic tokenizer instead of standard tokenizer, the new mapping is here.
{
"settings": {
"index": {
"number_of_shards": "1",
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 30,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
},
"ngram_filter": {
"type": "ngram",
"min_gram": 1,
"max_gram": 30,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer": {
"autocomplete_index": {
"type": "custom",
"tokenizer": "classic",
"filter": [
"lowercase",
"autocomplete_filter",
"asciifolding"
]
},
"lowercase_search": {
"filter": [
"lowercase",
"asciifolding"
],
"tokenizer": "keyword"
},
"ngram_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"ngram_filter",
"asciifolding"
]
},
"autocomplete_search": {
"type": "custom",
"tokenizer": "classic",
"filter": [
"lowercase",
"asciifolding"
]
}
},
"normalizer": {
"caseinsensitive_normalizer": {
"type": "custom",
"char_filter": [],
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
}
},
"mappings": {
"terms": {
"dynamic": "strict",
"properties": {
"name": {
"type": "keyword",
"normalizer": "caseinsensitive_normalizer",
"fields": {
"_autocomplete": {
"type": "text",
"analyzer": "autocomplete_index",
"search_analyzer": "autocomplete_search"
},
"_contain": {
"type": "text",
"analyzer": "ngram_analyzer",
"search_analyzer": "lowercase_search"
}
}
},
"type": {
"type": "keyword"
},
"term_name_type": {
"type": "keyword"
},
"first_letter": {
"type": "keyword",
"normalizer": "caseinsensitive_normalizer"
},
"preferred_name": {
"type": "keyword",
"normalizer": "caseinsensitive_normalizer"
},
"aliases": {
"type": "nested",
"include_in_root": true,
"properties": {
"type": {
"type": "keyword"
},
"name": {
"type": "keyword",
"normalizer": "caseinsensitive_normalizer",
"fields": {
"_contain": {
"type": "text",
"analyzer": "ngram_analyzer",
"search_analyzer": "lowercase_search"
}
}
}
}
},
"definition": {
"properties": {
"html": {
"type": "keyword"
},
"text": {
"type": "keyword"
}
}
},
"term_id": {
"type": "long"
},
"pretty_url_name": {
"type": "keyword"
},
"nci_concept_id": {
"type": "keyword"
},
"nci_concept_name": {
"type": "keyword"
},
"drug_info_summary_link": {
"properties": {
"text": {
"type": "keyword"
},
"url": {
"type": "keyword"
}
}
}
}
}
}
}
@zhuomingao - there's still a problem where a "contains" search text which starts with a /
matches terms without one.
Example:
A "contains" search for /cd
matches (among other things) the drugterm "allogeneic CD123-specific universal CAR123-expressing T lymphocytes"
Neither the term nor any of its aliases contains a /
character.
"aliases": [
{
"type": "CodeName",
"name": "UCART123"
},
{
"type": "Synonym",
"name": "UCART123 T cells"
},
{
"type": "Synonym",
"name": "universal chimeric antigen receptor T cell 123"
},
{
"type": "Synonym",
"name": "universal TALEN gene-edited CART123 cells"
},
{
"type": "Synonym",
"name": "allogeneic engineered T cells expressing anti-CD123 chimeric antigen receptor"
},
{
"type": "Synonym",
"name": "universal chimeric antigen receptor T cells targeting CD123"
}
],
Per conversation with @zhuomingao, this is because we are treating /
as a delimiter.
The current production system includes the /
in the search criteria; however the glossary API is also treating /
as a delimiter and not part of the search text.
After conversation with @lburack, we have decided to accept this difference from the current system.
Issue description
Autosuggest API is not retuning the results that contains 2/3 or /cd - instead it matches against 2, 3 and just cd
Steps to reproduce the issue
What's the expected result?
What's the actual result?
Additional details / screenshot
Related Tickets