Open tripleee opened 3 years ago
Just https://metasmoke.erwaysoftware.com/search?utf8=%E2%9C%93&body_is_regex=1&body=%F0%9F%98%8D stunningly crashes with "nothing to repeat" so it's the emoji itself which produces the error.
This appears to be a limitation in the Regex implementation which is used in the database. It doesn't accept, or ignores, characters which are > 0xFFFF (either as characters or as Unicode escapes; e.g. \x{0b03}
, which can have a max of 4 hex digits), so a lot of emoji just won't be recognized.
What problem has occurred? What issues has it caused?
Charcoal-SE/SmokeDetector#5550 links to https://metasmoke.erwaysoftware.com/search?utf8=%E2%9C%93&body_is_regex=1&body=%28%3Fs%3A%5Cb%5B%5Cs.%3E%5D%2A%F0%9F%98%8D%F0%9F%98%8D%2B%5CW%2A%5Cb%29 which however produces a Ruby traceback for me.
What would you like to happen/not happen?
The regex is not really wrong; the search should run and show the hits, instead of crash.
Looks like the regex engine in MariaDB doesn't think an emoji is something you can repeat? Dunno if we can devise a workaround or should just defer this upstream.