elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.55k stars 24.61k forks source link

Phonetic filter on the search_as_you_type field can cause "Too many cached tokens" errors #61491

Open imotov opened 4 years ago

imotov commented 4 years ago

When search_as_you_type field is combined with a filter that produces multiple tokens for the same position it can cause indexing to fail with "Too many cached tokens (> 100)" exception.

To reproduce:

DELETE test

PUT test
{
  "settings": {
    "analysis": {
      "filter": {
        "my_phonetic": {
          "type": "phonetic",
          "encoder": "beider_morse",
          "rule_type": "approx",
          "name_type": "generic",
          "languageset": [
            "english"
          ]
        }
      },
      "analyzer": {
        "my_phonetic": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_phonetic"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "field1": {
        "type": "search_as_you_type",
        "analyzer": "my_phonetic"
      }
    }
  }
}

PUT test/_bulk
{"index":{}}
{"field1": ["xao_xao_xao zao_zao_zao yao_yao_yao"]}

The error:

{
  "took" : 8,
  "errors" : true,
  "items" : [
    {
      "index" : {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "jhctIXQBVdWEidty8xiT",
        "status" : 500,
        "error" : {
          "type" : "illegal_state_exception",
          "reason" : "Too many cached tokens (> 100)"
        }
      }
    }
  ]
}
elasticmachine commented 4 years ago

Pinging @elastic/es-search (:Search/Search)

navratan-ch commented 3 years ago

@imotov have you solved this error facing the same issue

benwtrent commented 2 months ago

I don't know what the expected behavior is here.

This limit is to protect exceptionally weird and expensive situations in tokenization.

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)