Open mayya-sharipova opened 9 months ago
Pinging @elastic/es-search (Team:Search)
This happens because the way we rewrite multi phrase queries in CustomFieldQuery. When there are more than 16 terms for a wildcard part: "screen", we would rewrite phrase query as individual term queries. This is an old change that was done to protect a node against going out of memory in case of huge phrase queries.
The workaround 1: instead use unified
highlighter. unified
highlighter produces the expected output. Limitation: unified highlighter doesn't work with matched_fields
option.
"highlight": {
"fields": {
"content": {
"matched_fields": ["content"],
"type": "unified"
}
}
}
Another workaround is still use the fvh highlighter but with smaller max_expansion value. Limitation: as terms for expansion are chosen alphabetically, queries are restricted to those terms.
GET index1/_search
{
"query": {
"match_phrase_prefix": {
"content": {
"query": "mental health screen",
"slop": 2,
"max_expansions": 13
}
}
},
"highlight": {
"fields": {
"content": {
"matched_fields": [
"content"
],
"type": "fvh"
}
}
}
}
Pinging @elastic/es-search-relevance (Team:Search Relevance)
FVH highlighter fails to properly highlight multi phrase query with many terms.
Elasticsearch Version
V8.12
Steps to reproduce
We get the expected output for highlight:
But when many docs indexed:
the output is incorrect: