Open serkanozer opened 3 years ago
Pinging @elastic/es-search (:Search/Suggesters)
@romseygeek can you take a look ?
Following is my analysis:
match_phrase_prefix
query for search_as_you_type
field will be parsed into a spanNearQuery with many sub clauses. Last one of these clauses is a FieldMaskingSpanQuery of _3gram field( spanTermQury of _index_prefix
field actually), rest of the clauses are spanTermQuery of _3gram field
. For example , query
{
"query": {
"match_phrase_prefix": {
"search_as_you_type_field": {
"query": "quick brown fox dog c"
}
}
}
}
will be parsed into : SpanNearQuery
( SpanTermQuery:3_gram:quick brown fox
+ SpanTermQuery:_3gram:brown fox dog
+ FieldMaskingSpanQuery(SpanTermQuery:_index_prefix: fox dog c
))
when executing this query, lucene will check matchWidth
between each adjacent sub clauses to make sure the matchWidth
is not larger than the slop
org.apache.lucene.search.spans.NearSpansOrdered.java
...
matchWidth += (spans.startPosition() - prevSpans.endPosition());
...
...
if (stretchToOrder() && matchWidth <= allowedSlop) {
return atFirstInCurrentDoc = true;
}
...
matchWidth
will be incorrect if position in _3gram field is inconsistent with _index_prefix field.if search_as_you_type field is given multiValues. such as
{
"search_as_you_type_field": [
"owl",
"quick brown fox dog"
]
}
and query is
{
"query": {
"match_phrase_prefix": {
"search_as_you_type_field": {
"query": "quick brown fox d"
}
}
}
}
search_as_you_type_field._index_prefix: brown fox d
search_as_you_type_field._3gram: quick brown fox
use termVectors API we could found the fowlloing info:
test_index/_doc/1/_termvectors?fields=search_as_you_type_field._index_prefix
"quick brown fox": {
"term_freq": 1,
"tokens": [{
"position": 1,
"start_offset": 4,
"end_offset": 19
}]
}
"brown fox d": {
"term_freq": 1,
"tokens": [{
"position": 2,
"start_offset": 10,
"end_offset": 23
}]
}
test_index/_doc/1/_termvectors?fields=search_as_you_type_field._3gram
"quick brown fox": { "term_freq": 1, "tokens": [{ "position": 0, "start_offset": 0, "end_offset": 15 }] }
- `quick brown fox`has different position in _index_prefix field and _3gram field
- so matchWidth=2-(0+1)=1>allowedSlop(0), and doc will no show in query hits
### Any good idea to fix this problem?
Pinging @elastic/es-search-relevance (Team:Search Relevance)
Elasticsearch version (
bin/elasticsearch --version
): 7.9.3Steps to reproduce:
match_phrase_prefix
query on a search_as_you_type field doesn't seem to work properly as expected. In the example above first document is indexed with ["owl", "quick brown fox dog"], queryingq, qu, qui, .. quick b.. , quick brown f..
works butquick brown fox d, quick brown fox do, quick brown fox dog
doesn't. However all the possible prefix queries (for "quick brown fox dog") works for the document 2.I'm not sure this is an expected behavior but seems pretty strange and it is not documented anywhere
Provide logs (if relevant):