firasdib / Regex101

This repository is currently only used for issue tracking for www.regex101.com
3.29k stars 199 forks source link

International Components for Unicode #2332

Open Calorion opened 3 months ago

Calorion commented 3 months ago

Flavor Request

Please support ICU. This is the format supported natively by Apple devices, and is used in, e.g., Siri Shortcuts.

Calorion commented 2 months ago

I see that this has been requested before.

Here are the differences from PCRE2 that I've run into:

Operators

No support for \K.

No support for conditionals.

Does support bounded quantifiers (such as ? and {2,5}) in lookbehind.

Does not support recursion (?R) (haven't run into this one, but Wikipedia lists it).

Flags

These haven't caused issues for me, but they are differences.

Doesn't support the g flag, because there is no non-global mode. Ditto u.

Doesn't support UAJD flags.

Supports w flag:

UREGEX_UWORD Controls the behavior of \b in a pattern. If set, word boundaries are found according to the definitions of word found in Unicode UAX 29, Text Boundaries. By default, word boundaries are identified by means of a simple classification of characters as either “word” or “non-word”, which approximates traditional regular expression behavior. The results obtained with the two options can be quite different in runs of spaces and other non-word characters.

Differences with Java Regular Expressions