Genivia / RE-flex

A high-performance C++ regex library and lexical analyzer generator with Unicode support. Extends Flex++ with Unicode support, indent/dedent anchors, lazy quantifiers, functions for lex and syntax error reporting and more. Seamlessly integrates with Bison and other parsers.
https://www.genivia.com/doc/reflex/html
BSD 3-Clause "New" or "Revised" License
507 stars 85 forks source link

Can't parse integers with 3.2.0 #126

Closed gahr closed 2 years ago

gahr commented 2 years ago

Here's a stripped down version of a grammar I'm using

whitespace [\x00\x09\x0A\x0C\x0D\x20]+ digit [0-9] integer -? {digit}+ atom [^][()<>{}/%\x00\x09\x0A\x0C\x0D\x20]+

%%

{whitespace} { / ignore white space / } {integer} { std::cout << "int: " << str() << "\n"; } {atom} { std::cout << "atom: " << str() << "\n"; } . { throw "undefined"; }

%%

int main(int argc, char **argv) { std::istringstream input{ argv[1] }; return Lexer(input).lex(); }


Compiled as `reflex -I -o sample.cpp sample.l && c++ -o sample sample.cpp -I/usr/local/include -L/usr/local/lib -lreflex`.

* RE-flex 3.1.0

$ ./sample 1234 int: 1234


* RE-flex 3.2.0

./sample 1234 atom: 1234



The generated cpp file is identical but for the different versions in `REFLEX_VERSION` and the initial `...generated by...` comment.
genivia-inc commented 2 years ago

The internals of the regex lib compilation to DFA are a lot faster. Because I was afraid of breaking anything, I did a lot of testing besides the usual barrage of tests. But alas, it seems there is a problem marking the accepting states that determine which of the patterns matched, giving the atom precedence over int. I will fix this ASAP.

genivia-inc commented 2 years ago

I found the problem for this regression issue. I will release an update soon after some more testing, just to be sure.

genivia-inc commented 2 years ago

Fixed. The problem was a subtle optimization in the DFA construction that failed in very rare cases. Thanks for reporting this issue.

gahr commented 2 years ago

Thanks, I confirm it's fixed in 112e2de2.

gahr commented 2 years ago

Please don't forget to tag a 3.2.1 release, thanks!