Open ahoogol opened 11 months ago
Pinging @elastic/es-search (Team:Search)
This is due to highlight.weight_matches_mode.enabled
. I am not 100% sure why we are trying to get the offsets here.
But, to get around this bug,
PUT test_mask/_settings
{
"index" : {
"highlight.weight_matches_mode.enabled" : "false"
}
}
Need to still dig into the correct fix here.
error-trace:
java.lang.IllegalArgumentException: field 'text' was indexed without offsets, cannot highlight
at org.apache.lucene.highlighter@9.8.0/org.apache.lucene.search.uhighlight.FieldHighlighter.highlightOffsetsEnums(FieldHighlighter.java:157)
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.lucene.search.uhighlight.CustomFieldHighlighter.highlightOffsetsEnums(CustomFieldHighlighter.java:106)
at org.apache.lucene.highlighter@9.8.0/org.apache.lucene.search.uhighlight.FieldHighlighter.highlightFieldForDoc(FieldHighlighter.java:83)
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.lucene.search.uhighlight.CustomFieldHighlighter.highlightFieldForDoc(CustomFieldHighlighter.java:63)
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.lucene.search.uhighlight.CustomUnifiedHighlighter.highlightField(CustomUnifiedHighlighter.java:148)
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.search.fetch.subphase.highlight.DefaultHighlighter.highlight(DefaultHighlighter.java:81)
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.search.fetch.subphase.highlight.HighlightPhase$1.process(HighlightPhase.java:69)
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.search.fetch.FetchPhase$1.nextDoc(FetchPhase.java:163)
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.search.fetch.FetchPhaseDocsIterator.iterate(FetchPhaseDocsIterator.java:70)
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.search.fetch.FetchPhase.buildSearchHits(FetchPhase.java:169)
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:78)
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:711)
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:682)
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:543)
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.action.ActionRunnable$2.accept(ActionRunnable.java:51)
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.action.ActionRunnable$2.accept(ActionRunnable.java:48)
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.action.ActionRunnable$3.doRun(ActionRunnable.java:73)
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)
at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
This is due to
highlight.weight_matches_mode.enabled
. I am not 100% sure why we are trying to get the offsets here.But, to get around this bug,
PUT test_mask/_settings { "index" : { "highlight.weight_matches_mode.enabled" : "false" } }
Need to still dig into the correct fix here.
@benwtrent Thank you for your suggestion. While running your suggested command, the error no longer occurs. However, I've noticed that the generated highlight doesn't match my expected output.
With your command:
"highlight": {
"text": [
"(a) _ (a) b"
],
"stem": [
"_ (b) _ _"
]
}
I was expecting the highlight to look like this:
"highlight": {
"text": [
"(a) (_) a b"
]
}
Is there a way to achieve this expected result while avoiding the error?
@ahoogol turn on offsets for the fields and use "highlight.weight_matches_mode.enabled" : "true"
Thank you for your suggestion, @benwtrent. Yes, it highlights correctly when enabling offsets. But, my concern remains about the increase in index size. I'm still exploring alternative approaches to achieve the desired highlight without the need to turn on offsets to keep the index size manageable. If you have any further insights or suggestions, they would be greatly appreciated.
@ahoogol If you use "require_field_match" : false
as a highlighter option, you will get expected results without enabling offsets.
"highlight": {
"require_field_match" : false,
"pre_tags": "(",
"post_tags": ")",
"fields": {
"*": {}
},
"type": "unified"
}
Why it breaks is that internally we check that we the field we highlight on "text" is the same that the field that has matches "stem", but in this case there are different. That's the failure.
I will add this to documentation for span_field_masking
query and will close this issue.
@mayya-sharipova I included "require_field_match": false
in the highlighter options, but the resulting output remains different from what I expected:
Your suggestion output: (i tested it in 8.10.0 and 8.11.3)
"highlight": {
"text": [
"a _ (a) (b)"
]
}
Expected output:
"highlight": {
"text": [
"(a) (_) a b"
]
}
@ahoogol Indeed you are right about the expected behaviour, but it is not supported on span_field_masking
query. And it would be not easy to support it (without indexing with offsets).
The highlighting behaviour that you expect is based on Matches and was added from 8.10. But it relies on the fact that the highlighted field contains query terms, which is not your case.
I have added a documentation clarifying that span_field_masking
query has unexpected highlighting behaviour and should be used with require_field_match = false
.
I also modified the type of this issue as a "feature", that we may tackle sometime in the future.
Pinging @elastic/es-search-relevance (Team:Search Relevance)
Elasticsearch Version
8.10.4
Installed Plugins
No response
Java Version
bundled
OS Version
Elastic Cloud - GCP - Iowa (us-central1)
Problem Description
I encountered an issue when using the span_field_masking feature in Elasticsearch. When attempting to use the highlighter with this feature, the following error is thrown:
If I set "index_options": "offsets" in the mapping of the masked field 'stem', highlighting works as expected. However, I'm puzzled as to why the highlighter requires indexing offsets. I'd like to understand why the highlighter doesn't re-analyze the text to calculate offsets dynamically. My concern is that indexing offsets increases the index size, which I want to avoid.
Steps to Reproduce
Expected result
I was expecting the highlight to look like this: