Closed nfrmtkr closed 1 year ago
You have misunderstood the purpose of start_offset, which is to set a starting offset for a search while still allowing access to the earlier part of the subject string. This matters if the pattern starts with a lookbehind, as yours does. There is explicit documentation about this in the pcre2api man page, which includes this: "Setting startoffset differs from passing over a shortened string and setting PCRE2_NOTBOL in the case of a pattern that begins with any kind of lookbehind." Following this sentence there is an example that demonstrates in more detail, in particular it discusses the case where the pattern matches an empty string, as yours does. There is also discussion of this in the pcre2test man page under "Finding all matches in a string" and there is even some sample code in src/pcre2demo.c (or the pcre2demo man page).
Thank you for the clarification. We'll study the sources you've mentioned.
With pcre2-10.42 and pcre-8.43 (pcre_exec) we use following regex
(?<="tangany-client-id": ").*(?=")
on following string
`"tangany-client-id": "" "tangany-client-secret": "" "tangany-subscription": "" "tangany-client-id": "" "tangany-client-secret": "" "tangany-subscription": ""'
On first call to pcre2_match wie use start_offset = 0. On second call we set start_offset to 22 which is the start and end pos of the first match. This second call and all subsequent calls return 22 as start and end pos and we get stuck in a loop. I expect to get the second hit in line 4 because with start_offset I've skipped the first hit. If I use 23 then I get the second hit. It's may related to the fact that start and end position of the match is the same. THe pcre2_compile and pcre2_match function is called with flag=0.
I've solved this by using start_offset=0 always and "move" the position in the source string.
I think this is a bug in pcre2 and the same behavior I get with pcre1.