Closed drakejin closed 4 years ago
Pinging @elastic/es-search
Thanks for reporting @drake-jin . This is clearly a regression caused by https://issues.apache.org/jira/browse/LUCENE-8548. I opened https://issues.apache.org/jira/browse/LUCENE-8966 to fix this since digits should be not be grouped with other types of characters.
@jimczi Could I ask what version is patched?
thanks always.
Could I ask what version is patched?
The fix will be released in Lucene 8.3 so it should be available for a 7.x version of Elasticsearch. The sooner would be Elasticsearch 7.6 but there is no guarantee here.
It's also, Doesn't split English Letters(Alphabets) and Digit...
PUT /test
{
"number_of_shards" : "5",
"analysis" : {
"analyzer" : {
"korean" : {
"type" : "custom",
"tokenizer" : "nori_user_dict_tokenizer"
}
},
"tokenizer" : {
"nori_user_dict_tokenizer" : {
"mode" : "mixed",
"type" : "nori_tokenizer"
}
}
}
}
GET /test/_analyze
{
"text": ["foo3", "Foo3", "FOO3"],
"tokenizer": "nori_user_dict_tokenizer"
}
@jimczi
It seems this issue is resolved!! So, You can close this issue. Good job.
# It's tested ES Version 7.7.1
curl -X POST "http://localhost:9200/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
"analyzer": "nori",
"text": "44사이즈비키니"
}
'
{
"tokens" : [
{
"token" : "44",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 0
},
{
"token" : "사이즈",
"start_offset" : 2,
"end_offset" : 5,
"type" : "word",
"position" : 1
},
{
"token" : "비키니",
"start_offset" : 5,
"end_offset" : 8,
"type" : "word",
"position" : 2
}
]
}
Thanks @drake-jin
Describe the feature
Elasticsearch version (
bin/elasticsearch --version
):6.7.2
Plugins installed: []
JVM version (
java -version
):jvm 1.8
OS version (
uname -a
if on a Unix-like system):ubuntu 16.04
Description of the problem including expected versus actual behavior:
I wanted to analyze
44사이즈비키니
(44 size bikini :) ) so I execute the following script.Steps to reproduce
사이즈
(size) and비키니
(bikini). I saw these nouns are NNP and NNGElastic Settings And Analysis Result
Mapping Result
input
Output