PCRE2Project / pcre2

PCRE2 development is now based here.
Other
883 stars 185 forks source link

Allow to replace replacement (for recursive grammars data) #337

Open mvorisek opened 10 months ago

mvorisek commented 10 months ago

This is a feature request to allow to replace replacement, ie. to restart replace after next character of the match instead of the next character after the match.

Currently, recursive regexes must be manually restarted to match inner matches which imply some unneeded CPU overhead, especially in non-compiled programming languages.

I propose a new PCRE flag which will force the PCRE engine replace process to continue at +1 character instead of +N characters (where N is number of matched characters).

PhilipHazel commented 9 months ago

I have just had a look at this, and what you suggest is not something that can easily be done because the code works by creating its output in a different buffer. When the global option is set, the scan continues in the old (input) buffer. I think you could implement what you want externally fairly efficiently by having two buffers. Start with buffer 1 holding the input, call pcre2_match(), remember the offset where it matched, call pcre2_substitute() with your match_data block and PCRE2_SUBSTITUTE_MATCHED but NOT the global option, and buffer 2 as the output. Start the next call to pcre2_match() with buffer 2 as the input and the appropriate offset and buffer 1 as the output. And so on.