Closed paragikjain closed 1 year ago
Handling that many capture groups takes time. If you set the PCRE2_NO_AUTO_CAPTURE option, which turns the capture groups into non-capture groups, the program completes fairly quickly. I also tried your pattern and subject with JIT and the same is true - it takes a very long time when the groups are all captures.
Thank you @PhilipHazel !!
Is there any API PCRE2 provides from which we can re-write pattern optimally. for example (d)(d)(d)(d)
can we rewritten as (d){4}
.
I made a project for that purpose: https://github.com/zherczeg/repan It can, for example, remove those capturing groups, which is not used by the pattern.
There is actually a difference between (d)(d)(d)(d) and (d){4} because the first one has four different capturing groups but the second one has only one, repeated of course. If you change your pattern to be (d){3999}e the match completes much more quickly, though with JIT it hits the JIT stack limit.
Thanks @PhilipHazel, few final questions I have:
Thanks, in advance.
1) yes, https://www.pcre.org/current/doc/html/pcre2jit.html#SEC7 2) a match-try counter is provided by the api. It is not based on seconds though, it decreases for every match attempt, and the match fails if it reaches 0. The "match attempt" is not precisely defined.
Check out the various pcre2set.... functions for various limits that can be set.
thank you @zherczeg and @PhilipHazel for you support !!!!!
Below is a basic C code snippet Link. When the input contains 100,000 elements and there are 3,999 capture groups, the
pcre_match
function is running extremely slowly. I'm unsure whether it's stuck in an infinite loop. How can I enhance the performance of my code given this large number of capture groups? It's important to note that I cannot control the input and pattern as they will be provided by the user.Code For the above-Mentioned Issue
Please help!!! Thanks,