elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1k stars 24.82k forks source link

Missing Completion suggestions when multiple suggestion paths lead to same document #46445

Open cbuescher opened 5 years ago

cbuescher commented 5 years ago

It was observed that when restricting completion suggestions to a certain size n, the top suggestions returned can miss out on certain suggestions that otherwise appear in the top n results when querying for a larger return window (by increasing size). This was particularly observed when there was more than one shard and and the analyzer on the suggest field produced multiple token in the same location. This can lead to multiple paths in the Lucene suggester datastructure leading to the same doc id.

elasticmachine commented 5 years ago

Pinging @elastic/es-search

cbuescher commented 5 years ago

As a simple recreation, consider this example:

PUT test_index
{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword"
      },
      "name_completion": {
        "type": "completion",
        "analyzer": "my_analyzer"
      }
    }
  },
  "settings": {
    "index": {
      "number_of_shards": "2",
      "analysis": {
        "filter": {
          "my_synonym": {
            "type": "synonym",
            "synonyms": [
              "meyer => meyer, m",
              "mueller => mueller, m",
              "mann => mann, m",
              "meier => meier, m",
              "murnau => murnau, m",
              "munch => munch, m",
              "myerz = myerz, m",
              "mohn => mohn, m",
              "mahler => mahler, m"
            ]
          }
        },
        "analyzer": {
          "my_analyzer": {
            "filter": [
              "my_synonym"
            ],
            "tokenizer": "whitespace"
          }
        }
      }
    }
  }
}

PUT /_bulk
{ "index" : { "_index" : "test_index", "_id" : "1" } }
{"name": "anna meyer", "name_completion": {"input": "anna meyer", "weight": 1}}
{ "index" : { "_index" : "test_index", "_id" : "2" } }
{"name": "anna mueller", "name_completion": {"input": "anna mueller", "weight": 2}}
{ "index" : { "_index" : "test_index", "_id" : "3" } }
{"name": "anna mann", "name_completion": {"input": "anna mann", "weight": 3}}
{ "index" : { "_index" : "test_index", "_id" : "4" } }
{"name": "anna murnau", "name_completion": {"input": "anna murnau", "weight": 4}}
{ "index" : { "_index" : "test_index", "_id" : "5" } }
{"name": "anna munch", "name_completion": {"input": "anna munch", "weight": 5}}
{ "index" : { "_index" : "test_index", "_id" : "6" } }
{"name": "anna myerz", "name_completion": {"input": "anna myerz", "weight": 6}}
{ "index" : { "_index" : "test_index", "_id" : "7" } }
{"name": "anna mohn", "name_completion": {"input": "anna mohn", "weight": 7}}
{ "index" : { "_index" : "test_index", "_id" : "8" } }
{"name": "anna mahler", "name_completion": {"input": "anna mahler", "weight": 8}}
{ "index" : { "_index" : "test_index", "_id" : "9" } }
{"name": "anna meier", "name_completion": {"input": "anna meier", "weight": 9}}

On at least 7.3 , getting the top 5 suggestions for "anna":

POST /test_index/_search
{
    "suggest": {
        "test-suggest" : {
            "prefix" : "anna", 
            "completion" : { 
                "field" : "name_completion",
                "size": 5
            }
        }
    }
}

returns the following five suggestions for _id 9, 8, 7, 6, 4 in decending order of weights. However there is doc 5 with weight 5 which should appear before doc 4. Increasing the "size" to 10 returns document ids 9 to 2 in correct order, but is still missing doc 1, which again only appears when querying for more than 12 suggestions.

cbuescher commented 5 years ago

I opened https://github.com/apache/lucene-solr/pull/913 to fix some of the underlying issues in Lucene. We'd also need to change our own TopSuggestGroupDocsCollector#collect method to correctly signal document rejections after that change.

kunisen commented 4 years ago

Hi team, can we have this fix backported to 6.x (e.g. 6.8)?

elasticsearchmachine commented 2 years ago

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine commented 3 months ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)