Genivia / RE-flex

A high-performance C++ regex library and lexical analyzer generator with Unicode support. Extends Flex++ with Unicode support, indent/dedent anchors, lazy quantifiers, functions for lex and syntax error reporting and more. Seamlessly integrates with Bison and other parsers.
https://www.genivia.com/doc/reflex/html
BSD 3-Clause "New" or "Revised" License
529 stars 86 forks source link

The documentation for the `\0` escape doesn't match the implementation #216

Closed tlemo closed 1 day ago

tlemo commented 2 weeks ago

The documentation says that \0 matches the NUL character, but at least when using the default matcher, \0 seems to be treated as the prefix for an octal encoding of a character

genivia-inc commented 2 weeks ago

Thank you for your feedback! Will be fixed.

genivia-inc commented 2 weeks ago

The problem is actually in the regex converter, which takes the "signature" of a regex engine to make sure it complies and also converts non-supported regex syntax to syntax the the regex engine supports.

Line 53 in include/reflex/matcher.h should be updated to add a 0 after the W to natively support \0 escapes:

static std::string convert(T regex, convert_flag_type flags = convert_flag::none, bool *multiline = NULL)
{
  return reflex::convert(regex, "imsx#=^:abcdefhijklnrstuvwxzABDHLNQSUW0<>?", flags, multiline);
}
genivia-inc commented 1 day ago

I've committed a minor update that will be included in the next official release 5.1.