Open spershin opened 5 months ago
I've done some prototyping, so far implementing only the simplest regexp_like() (also the most used one). Was running comparison of Oni with JONI and found few discrepancies as well.
SELECT regexp_like('澳門WS團(Weiẞ Schwarz Of Macau)', '(?i)Weiß');
. Oni: true, JONI: false, RE2: false.SELECT regexp_like('csa-arch.co.uk', '^[a-z-.]+$');
incorrectly returns FALSE in JONI, but works properly in Oni and RE2.SELECT regexp_like('0 0 * * 2,6', '^? ? \* \* [0-6]|@weekly');
. This one is probably JONI's issue because pattern '^?' is not valid and useless as it just skips arbitrary number of characters from the very beginning and equivalent to not specifying anything.\u
: select regexp_like('claim_text', '^([\u0020-\u02AF\u2000-\u20CF])*$');
Starting to work on implementing regexp_replace() (the 2nd most used function) to see how it fares. It is harder than RE2, because RE2 has a function that we all and Oni does not - we need to implement replace code ourselves.
Description
Proposition
This is a proposition for discussion of introducing a set of regex functions based on Oniguruma library.
The main reason for this is to use these functions in Presto. Presto mainly uses JONI to implement regex functions. And JONI is a Java port of Oniguruma. Note, that Presto has support for RE2J as well, but JONI is very well established and used in large companies like Meta.
We have done some investigation on what are the main differences in the production workload that currently stop us from migrating from JONI to RE2J and then to RE2 in Prestissimo. Note that we are far from covering the whole workload.
So far we found out 8 discrepancies:
Different Results Returned
Unsupported Features
More Information