elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.9k stars 24.73k forks source link

Unexpected behaviour with search_as_you_type field indexed with multiple values #64394

Open serkanozer opened 3 years ago

serkanozer commented 3 years ago

Elasticsearch version (bin/elasticsearch --version): 7.9.3

Steps to reproduce:

curl -X PUT "localhost:9200/test_index?pretty" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "search_as_you_type_field": {
        "type": "search_as_you_type"
      }
    }
  }
}
'

curl -X PUT "localhost:9200/test_index/_doc/1?refresh&pretty" -H 'Content-Type: application/json' -d'
{
  "search_as_you_type_field": ["owl", "quick brown fox dog"],
}
'

curl -X PUT "localhost:9200/test_index/_doc/2?refresh&pretty" -H 'Content-Type: application/json' -d'
{
  "search_as_you_type_field": ["quick brown fox dog", "owl"]
}
'

curl -X GET "localhost:9200/test_index/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match_phrase_prefix": {
      "search_as_you_type_field": {"query": "quick brown fox d"}
    }
  }
}
'
this returns the second document but not the first

match_phrase_prefix query on a search_as_you_type field doesn't seem to work properly as expected. In the example above first document is indexed with ["owl", "quick brown fox dog"], querying q, qu, qui, .. quick b.. , quick brown f.. works but quick brown fox d, quick brown fox do, quick brown fox dog doesn't. However all the possible prefix queries (for "quick brown fox dog") works for the document 2.

I'm not sure this is an expected behavior but seems pretty strange and it is not documented anywhere

Provide logs (if relevant):

elasticmachine commented 3 years ago

Pinging @elastic/es-search (:Search/Suggesters)

jimczi commented 3 years ago

@romseygeek can you take a look ?

mushao999 commented 3 years ago

Following is my analysis:

1. Query Parsing

match_phrase_prefix query for search_as_you_type field will be parsed into a spanNearQuery with many sub clauses. Last one of these clauses is a FieldMaskingSpanQuery of _3gram field( spanTermQury of _index_prefix field actually), rest of the clauses are spanTermQuery of _3gram field. For example , query

{
  "query": {
    "match_phrase_prefix": {
      "search_as_you_type_field": {
        "query": "quick brown fox dog c"
      }
    }
  }
}

will be parsed into : SpanNearQuery( SpanTermQuery:3_gram:quick brown fox + SpanTermQuery:_3gram:brown fox dog + FieldMaskingSpanQuery(SpanTermQuery:_index_prefix: fox dog c))

2. Query Execution

3.Inconsistent positions

if search_as_you_type field is given multiValues. such as

{
  "search_as_you_type_field": [
    "owl",
    "quick brown fox dog"
  ]
}

and query is

  {
    "query": {
      "match_phrase_prefix": {
        "search_as_you_type_field": {
          "query": "quick brown fox d"
        }
      }
    }
  }

test_index/_doc/1/_termvectors?fields=search_as_you_type_field._3gram

"quick brown fox": { "term_freq": 1, "tokens": [{ "position": 0, "start_offset": 0, "end_offset": 15 }] }


- `quick brown fox`has different position in _index_prefix field and _3gram field
- so matchWidth=2-(0+1)=1>allowedSlop(0), and doc will no show in query hits

 ### Any good idea to fix this problem?
elasticsearchmachine commented 3 months ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)