Open Blacksmoke16 opened 1 year ago
I looked into what PHP does and they apparently use this by default that is if pcre.jit option is enabled, which it is by default, and the pattern was compiled successfully with a non-zero JIT size:
I was a bit confused on how they got the PCRE2_ANCHORED
option to work given the docs say its not supported at match time. I then noticed they're ONLY passing the NO_UTF_CHECK
to jit_match
and realized I think it works for them because they don't have to deal with the notion of match time options since you have to pass the whole pattern which would have all the options defined at compile time.
So in short, it may be enough to just check the passed options in match_data
and if they do not contain any unsupported options, use the fast path API, otherwise use the normal flow.
Another more extreme option would be to just deprecate the ANCHORED
option in favor of an in-pattern approach as mentioned in the API docs:
This effect can also be achieved by appropriate constructs in the pattern itself, which is the only way to do it in Perl.
Tho IMO, prob not worth the effort.
That could be a viable path to explore. However I'm not sure how much such an external match-time check differs from the checks in pcre2_match
itself. But maybe there is something to gain. It certainly wouldn't hurt to try.
Under the premise that such a change can have subtle intricacies, I'd suggest to defer that until after the initial transition to PCRE2 is carried out (i.e. provisionally targeting the 1.9 or 1.10 release).
However I'm not sure how much such an external match-time check differs from the checks in
pcre2_match
itself.
In a simple benchmark I put together, quite a lot. Like up to 30% in some cases. The main reasoning is we're already passing the subject size, Crystal strings are already UTF-8, and Crystal strings are \0
terminated when passed to C libs. The latter two are things it checks for, probably among others. In the end its a drop in replacement to pcre2_match
, bar needing to support the handful of unsupported match time options.
But yea, waiting a bit sounds good to me :+1:.
Feature Request
Follow up to https://github.com/crystal-lang/crystal/issues/3852#issuecomment-943968239, currently regexes are compiled with JIT mode enabled, which provides a good boost in performance. However PCRE(2) also has another fast path API that avoids some sanity checks and can offer an additional 10%+ boost. E.g.
The catch is not all regexes can be used in this context, or may even benefit from it, so it doesn't make sense to globally enable it like we did JIT mode.
It would be nice if there was some way to force a particular regex pattern to use this "fast path" API for cases where you know that pattern will be used a lot, or within a hot path. The first thought that come to mind is just some extra boolean ivar used to determine which method to call. Alternatively it could be added into the
Regex::Options
enum to allow regex literals to also be used. However, given this is somewhat related to #11320 in that it's PCRE(2) specific it may not ultimately be worth implementing, tho at least worth discussing.WDYT?