Open mayya-sharipova opened 8 months ago
ExtendedIntervalsSource
explicitly returns -1, and this was done in https://github.com/apache/lucene/pull/803 (ticket LUCENE-10229).
From the ticket:
The reason extend does not work for highlighting is that, quite reasonably, it can only return the offsets delegated from the source interval. Once you shift left or right from the source interval's position, the offset information cannot be retrieved (because this would require per-document, random-access position-offset map to be present somewhere).
therefore, is it normal that your example fails, or is it an edge case that wasn't covered by that ticket ? what would be the expected output ?
This happens because of ClassicAnalyzer that removes stop words, and because of it usage of ExtendedIntervalsSource that returns -1 offsets.
Just for clarity, it fails when highlighting lazy
: an ExtendedIntervalsSource
got created to account for the preceding stop word that got removed by the analyzer, which then returns -1
during highlighting.
In OffsetsFromPositions
there is some logic to get offsets from positions.
Would it make sense to apply a similar logic in FieldHighlighter
in the case where offsets are missing because of the ExtendedIntervalsSource
use ?
Description
UnifiedHighlighter based on matches incorrectly returns field 'X' was indexed without offsets, cannot highlight
Test to reproduce:
produces an error:
A workaround to disable highlighting based on matches:
This happens because of
ClassicAnalyzer
that removes stop words, and because of it usage ofExtendedIntervalsSource
that returns -1 offsets.Version and environment details
Lucene v 9.9.1