elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.49k stars 24.88k forks source link

array_index_out_of_bounds_exception when using two synonym_graph filters #74118

Open dtrieschnigg opened 3 years ago

dtrieschnigg commented 3 years ago

Elasticsearch version (bin/elasticsearch --version): 7.9.3 (lucene: 8.6.2)

Description of the problem including expected versus actual behavior:

When an analyzer uses two synonym_graph filters after each other (my use case: the first filter does decompounding and the second filter expands synonyms. For instance "cellphone" is decompounded into "cell" and "phone", and "phone" is expanded with "telephone"), this results in an array_index_out_of_bounds_exception when searching for a compound

Steps to reproduce:

DELETE testindex
PUT testindex
{
    "mappings" : {
      "properties" : {
        "body" : {
          "type" : "text",
          "analyzer" : "my_analyzer"
        }
      }
    },
    "settings" : {
      "index" : {
        "analysis" : {
          "filter" : {
            "synonym1" : {
              "type" : "synonym_graph",
              "synonyms": [ "cell phone, cellphone" ]
            },
            "synonym2" : {
              "type" : "synonym_graph",
              "synonyms": [ 
                "cell, cells"
              ]
            }
          },
          "analyzer" : {
            "my_analyzer" : {
              "filter" : [
                "synonym1",
                "synonym2"
              ],
              "type" : "custom",
              "tokenizer" : "standard"
            }
          }
        }
      }
    }
}
POST testindex/_search
{
  "query": {
    "match": {
      "body": "cell phone"
    }
  }
}

results in:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "query_shard_exception",
        "reason" : "failed to create query: Index 0 out of bounds for length 0",
        "index_uuid" : "Wjc4TvHQQfmOgy2c66PHJg",
        "index" : "testindex"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "testindex",
        "node" : "JdOTt9JWR0WeBkMOiC4nIQ",
        "reason" : {
          "type" : "query_shard_exception",
          "reason" : "failed to create query: Index 0 out of bounds for length 0",
          "index_uuid" : "Wjc4TvHQQfmOgy2c66PHJg",
          "index" : "testindex",
          "caused_by" : {
            "type" : "array_index_out_of_bounds_exception",
            "reason" : "Index 0 out of bounds for length 0"
          }
        }
      }
    ]
  },
  "status" : 400
}
dtrieschnigg commented 3 years ago

A workaround is to use a synonym rather than a synonym_graph filter.

elasticmachine commented 3 years ago

Pinging @elastic/es-search (Team:Search)

dtrieschnigg commented 3 years ago

Any update on this issue?

amitmbm commented 2 years ago

I am also facing this issue, I was able to reproduce the issue with the master branch code, and its impact is even more, this issue kills the ES process in my local.

romseygeek commented 2 years ago

This is a long standing problem in lucene itself, that the synonym graph filter can't accept graphs as inputs: https://issues.apache.org/jira/browse/LUCENE-9966. But we shouldn't be throwing errors to the user here, and in particular we shouldn't be allowing processes to die.

There are a few things we can do to fix this, I think. Firstly we should be able to detect in ES when you have an analysis chain that pipes a graph into a filter that doesn't accept one, and throw an error (or at least emit a warning). Secondly I think we can improve our synonym filter definitions to allow grouping of inputs to make it easier to categorise synonyms without having to specify multiple filters.

amitmbm commented 2 years ago

@romseygeek , Thanks for your inputs, Iet me work on the first part of it, what would you suggest, throwing an exception and what exception should be throw here? IllegalArgumentException?

amitmbm commented 2 years ago

@romseygeek please let me know your suggestions, so that I can fix this issue?

elasticsearchmachine commented 4 months ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)