elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.57k stars 24.63k forks source link

Lowercase normalizer is used for wildcard queries #28894

Open dadoonet opened 6 years ago

dadoonet commented 6 years ago

Elasticsearch version (bin/elasticsearch --version): 6.2.2 Description of the problem including expected versus actual behavior:

Say you index a field Aa as a text field with a Lowercase analyzer. When you search for aa*, it matches. Searching for Aa* does not match which is normal as the wildcard queries are not analyzed.

Say you index a field Aa as a keyword field with a Lowercase normalizer. When you search for aa*, it matches. Searching for Aa* matches as well although the wildcard queries are not analyzed.

Steps to reproduce:

DELETE test
PUT test
{
  "settings": {
    "analysis": {
      "normalizer": {
        "lowercase_normalizer": {
          "type": "custom",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "foo": {
          "type": "text",
          "analyzer": "simple", 
          "fields": {
            "keyword": {
              "type": "keyword",
              "normalizer": "lowercase_normalizer"
            }
          }
        }
      }
    }
  }
}
PUT test/doc/1?refresh
{
  "foo": "Bbb Aaa"
}

# Does not match -> OK
GET test/_search
{
  "query": {
    "wildcard": {
      "foo": "Bb*"
    }
  }
}
# Match -> OK
GET test/_search
{
  "query": {
    "wildcard": {
      "foo": "bb*"
    }
  }
}
# Match but should not -> KO
GET test/_search
{
  "query": {
    "wildcard": {
      "foo.keyword": "Bb*"
    }
  }
}
# Match -> OK
GET test/_search
{
  "query": {
    "wildcard": {
      "foo.keyword": "bb*"
    }
  }
}

I spoke with @jpountz who thinks it might be related to https://issues.apache.org/jira/browse/LUCENE-8186

Opening the issue so we can track it.

javanna commented 6 years ago

cc @elastic/es-search-aggs

jpountz commented 6 years ago

Fixed on the Lucene side via https://issues.apache.org/jira/browse/LUCENE-8186 but will need change on the Elasticsearch side as well.

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)