Closed nalundgaard closed 2 years ago
Nicholas, you are right across the board. Thanks for the detailed comments.
Regarding intention - I believe the initial author didn't think of it as a must at that time. If memory serves me well I enabled unicode matching, but didn't go for the ucp
option. There's a question around performance and I guess in an extended setup with many schemata, every millisecond adds up. I remember vaguely that in my setup I was experiencing a big hit at the time. Possibly/probably it's all gone by now.
One idea might be to add an option re_options
to jesse to pass on re:compile/2 options (actually re:run/2 just to be pedantic, since that's what gets called from jesse) and default to unicode
and ucp
. This way there are less surprises, and those who are impacted by performance loss, can configure jesse to do less and be faster.
Would you be up for filing a PR? 🙏
Yeah, sure! I'll try to have a look at the requisite change (and figuring out how to add a test for it) when I have some free time. Thanks for the explanation, and totally agree about the performance cost issues if you don't need it, the re
module is quite... slow.
I have noticed that when putting regex patterns in
pattern
andpatternProperties
,\w
does not match unicode letters (likeū
,ス
,п
) as expected. You have to use[\p{L}\p{N}_]
instead. It looks like this is because the patterns are not compiled/executed with theucp
inre:compile/2
:Later it explains:
Is this intended to not use this behavior? It seems unexpected to me (having this behavior off by default is not generally expected). Obviously I have a workaround just using the pattern I mentioned (
[\p{L}\p{N}_]
), so not a big deal if this isn't the behavior you all want.