Genivia / RE-flex

A high-performance C++ regex library and lexical analyzer generator with Unicode support. Extends Flex++ with Unicode support, indent/dedent anchors, lazy quantifiers, functions for lex and syntax error reporting and more. Seamlessly integrates with Bison and other parsers.
https://www.genivia.com/doc/reflex/html
BSD 3-Clause "New" or "Revised" License
504 stars 85 forks source link

Error while using Unicode characters adjacent to braces. #166

Closed SouravKB closed 1 year ago

SouravKB commented 1 year ago

When I write an unicode character directly adjacent to opening brace in lex file, it fails, saying '{' mismatched.

Minimal reproducible example:

%o main unicode
x .
%%
§{x} echo();
%%

Compilation output:

test.ll:4: error: malformed regular expression or unsupported syntax
error at position 4
§{x}
   \___mismatched { }

But the regex pattern itself isn't malformed. §+{x}, §""{x}, [§]{x} compiles perfectly fine. I have tried putting various non-ascii unicode characters. All gives same error.

genivia-inc commented 1 year ago

This works for me with freespace, which permits spacing in regex patterns (but actions must be placed in curly braces):

%o main unicode freespace
x x
%%
§ {x} { echo(); }
%%

I think the problem may have something to do with the macro expansion logic with unicode. Will check it further.