elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.21k stars 24.85k forks source link

Wildcard search case insensitivity on accented characters #109385

Open feherbbj opened 5 months ago

feherbbj commented 5 months ago

Elasticsearch Version

7.16.3

Installed Plugins

ICU plugin

Java Version

bundled

OS Version

Win 10

Problem Description

I have an index, where a text field contains text from numerous languages (French, Hungarian...etc). In those languages the accented characters are quite common. The text field type is wildcard. If the field contains an accented character with an uppercase and a wildcard search is performed on the field, with case insensitivity the document is not found. The document should be found (see sample at reproduction, searching for "tést" or "á" does not return any result (even though case insensitity set to true), while "tÉst" and "Á" does return.

Steps to Reproduce

DELETE test-accent PUT test-accent { "mappings": { "properties": { "text": { "type": "wildcard" } } } }

POST test-accent/_doc/ { "text": "tÉst" } POST test-accent/_doc/ { "text": "Á" } POST test-accent/_doc/ { "text": "E" }

GET test-accent/_search { "query": { "match_all": {} } } GET test-accent/_search { "query": { "wildcard": { "text": { "value" : "á", "case_insensitive": true } } } } GET test-accent/_search { "query": { "wildcard": { "text": { "value" : "tést", "case_insensitive": true } } } }

Logs (if relevant)

No response

elasticsearchmachine commented 5 months ago

Pinging @elastic/es-search (Team:Search)

feherbbj commented 5 months ago

PS: Tested with 8.12.2 and it is reproducible there as well

benwtrent commented 5 months ago

FYI, I tested in new Lucene and it is still reproducible. In the new Lucene version, we update ICU version.

elasticsearchmachine commented 4 months ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)

feherbbj commented 2 months ago

is there any news on this topic?

benwtrent commented 2 months ago

No, other than it seems like a bug :)