Grok.match(bytes, offset, length, extracter) API seems to have problems when using a positive offset together with a pattern that matches the input as a suffix (ie. not the beginning, but discarding a prefix).
Eg. with a pattern like %{WORD:a} %{WORD:b} %{NUMBER:c:int} and an input string like x1 a1 b1 12, the pattern is supposed to discard x1 and match the rest of the input.
Everything works fine as long as the string is stored in a byte[] alone, and the offset used is 0.
If the input of match() is a bigger array, where the string is in the middle and an offset > 0 is used, then the match fails.
Here is a code snippet that reproduces the problem.
Checking the library code, the problem could depend on the usage of Matcher.search(), where we are passing matcher.search(offset, length, Option.DEFAULT)
Since for Regex.matcher() we are passing offset and offset + length (instead of offset, length), it's possible that also matcher.search() should accept offset + length.
Joni API documentation is very basic and it's absolutely not clear about this specific API and the problem never happened at higher level because we are always using that API with offset = 0.
The fix proposed above solves the problem and passes all the tests, but it would be good to have a confirmation from someone with more expertise on this lib.
Grok.match(bytes, offset, length, extracter)
API seems to have problems when using a positive offset together with a pattern that matches the input as a suffix (ie. not the beginning, but discarding a prefix).Eg. with a pattern like
%{WORD:a} %{WORD:b} %{NUMBER:c:int}
and an input string likex1 a1 b1 12
, the pattern is supposed to discardx1
and match the rest of the input.Everything works fine as long as the string is stored in a
byte[]
alone, and the offset used is0
. If the input ofmatch()
is a bigger array, where the string is in the middle and an offset > 0 is used, then the match fails.Here is a code snippet that reproduces the problem.
[edit] added the same test in https://github.com/elastic/elasticsearch/pull/95003
Checking the library code, the problem could depend on the usage of Matcher.search(), where we are passing
matcher.search(offset, length, Option.DEFAULT)
Since for Regex.matcher() we are passingoffset
andoffset + length
(instead ofoffset
,length
), it's possible that alsomatcher.search()
should acceptoffset + length
.Joni API documentation is very basic and it's absolutely not clear about this specific API and the problem never happened at higher level because we are always using that API with offset = 0.
The fix proposed above solves the problem and passes all the tests, but it would be good to have a confirmation from someone with more expertise on this lib.
PR following with the proposed fix