Closed nwellnhof closed 3 months ago
There's still quite of bit a bloat in the re2c generated code but that's hard to fix. The main issue is that re2c seems to handle {m}
style quantifiers by creating m copies of the subregex. This approach is taken by regex engines like RE2 (unrelated to re2c) as well but isn't well-suited to ahead-of-time compilation.
The regexes don't require UTF-8 features and work in ASCII mode as well. Disabling UTF-8 reduces the size of the code generated by re2c by a couple of KBs.
I regenerated the regex code with re2c 3.0 because that's what I have on Ubuntu 22.04 and I had to add a
(void) marker
line to suppress an unused variable warning. Feel free to regenerate with your version of re2c.