elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.73k stars 24.68k forks source link

Phrase suggester provides incorrect results #62677

Open smuthuka opened 4 years ago

smuthuka commented 4 years ago

Elasticsearch version (bin/elasticsearch --version): 7.8

Plugins installed: []

JVM version (java -version): openjdk version "14.0.1" 2020-04-14

OS version (uname -a if on a Unix-like system): Linux d59b172dd674 4.9.184-linuxkit #1 SMP Tue Jul 2 22:58:16 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior: Phrase suggester doesn't provide the correct suggestions even when the index has the right information. This usually happens after adding/deleting documents.

Steps to reproduce: Index Mapping

PUT sugg_test
{
  "settings": {
    "index": {
      "number_of_shards": "1",
      "number_of_replicas": "1",
      "analysis": {
        "analyzer": {
          "trigram": {
            "type": "custom",
            "tokenizer": "whitespace",
            "filter": [
              "lowercase",
              "shingle"
            ]
          },
          "reverse": {
            "type": "custom",
            "tokenizer": "whitespace",
            "filter": [
              "lowercase",
              "reverse"
            ]
          }
        },
        "normalizer": {
          "lowercase_normalizer": {
            "type": "custom",
            "filter": [
              "lowercase"
            ]
          }
        },
        "filter": {
          "shingle": {
            "type": "shingle",
            "min_shingle_size": 2,
            "max_shingle_size": 3
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "normalizer": "lowercase_normalizer"
          },
          "trigram": {
            "type": "text",
            "analyzer": "trigram"
          },
          "reverse": {
            "type": "text",
            "analyzer": "reverse"
          }
        }
      },
      "suggest": {
        "type": "completion"
      }
    }
  }
}

Suggest query

GET sugg_test/_search
{
  "suggest": {
    "text": "SEARCH TERM",
    "name_phrase": {
      "phrase": {
        "field": "name.trigram",
        "size": 3,
        "confidence": 1,
        "direct_generator": [
          {
            "field": "name.trigram",
            "suggest_mode": "always"
          },
          {
            "field": "name.reverse",
            "suggest_mode": "always",
            "pre_filter": "reverse",
            "post_filter": "reverse"
          }
        ],
        "collate": {
          "query": {
            "source": {
              "match_phrase": {
                "name": "{{suggestion}}"
              }
            }
          },
          "prune": false
        }
      }
    }
  }
}

I add a document for "elk stack" and I am able to get that back as a suggestion for "elk stacks"

PUT sugg_test/_doc/1
{
  "school_id": 71848,
  "name": "elk stack",
  "suggest": [
    {
      "input": [
        "elk stack"
      ],
      "weight": 2
    }
  ]
}

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "name_phrase" : [
      {
        "text" : "elk stacks",
        "offset" : 0,
        "length" : 10,
        "options" : [
          {
            "text" : "elk stack",
            "score" : 0.5508418
          }
        ]
      }
    ]
  }
}

Reverse works as well. I delete and recreate the index and add a document for "elk stacks" and I am able to get that back as a suggestion for "elk stack"

PUT sugg_test/_doc/1
{
  "school_id": 71848,
  "name": "elk stacks",
  "suggest": [
    {
      "input": [
        "elk stacks"
      ],
      "weight": 2
    }
  ]
}

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "name_phrase" : [
      {
        "text" : "elk stack",
        "offset" : 0,
        "length" : 9,
        "options" : [
          {
            "text" : "elk stacks",
            "score" : 0.5508418
          }
        ]
      }
    ]
  }
}

This all works good. Now I delete and recreate the index.

  1. Add "elk stack" (this gets correctly suggested for "elk stacks")
  2. Remove "elk stack" (with a _delete_by_query)
  3. Add "elk stacks"
    PUT sugg_test/_doc/1
    {
    "school_id": 71848,
    "name": "elk stacks",
    "suggest": [
    {
      "input": [
        "elk stacks"
      ],
      "weight": 2
    }
    ]
    }
  4. Query for "elk stack" and that doesn't return anything
    {
    "took" : 2,
    "timed_out" : false,
    "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
    },
    "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
    },
    "suggest" : {
    "name_phrase" : [
      {
        "text" : "elk stack",
        "offset" : 0,
        "length" : 9,
        "options" : [ ]
      }
    ]
    }
    }

Provide logs (if relevant):

elasticmachine commented 4 years ago

Pinging @elastic/es-search (:Search/Suggesters)

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)