Genivia / RE-flex

A high-performance C++ regex library and lexical analyzer generator with Unicode support. Extends Flex++ with Unicode support, indent/dedent anchors, lazy quantifiers, functions for lex and syntax error reporting and more. Seamlessly integrates with Bison and other parsers.
https://www.genivia.com/doc/reflex/html
BSD 3-Clause "New" or "Revised" License
529 stars 86 forks source link

question: using RE-flex for unicode manipulation #191

Closed romange closed 1 year ago

romange commented 1 year ago

We already use RE-flex as a unicode lexer. Now we need another functionality: we would like to perform unicode lowercase/uppercase conversion of a string. Is there a way to hack it with RE-flex?

genivia-inc commented 1 year ago

Sure that is possible.

Assuming the string match is consumed with str() or wstr() in your lexer rules, you can then use std::towlower for example in a loop over the string to construct a new string with lower case text.

If you use str() to consume UTF-8, then the loop should loop over UTF-8 multi-bytes.

There are some C++ examples on the web on lower/upper case C++ string conversion.