Genivia / RE-flex

A high-performance C++ regex library and lexical analyzer generator with Unicode support. Extends Flex++ with Unicode support, indent/dedent anchors, lazy quantifiers, functions for lex and syntax error reporting and more. Seamlessly integrates with Bison and other parsers.
https://www.genivia.com/doc/reflex/html
BSD 3-Clause "New" or "Revised" License
522 stars 85 forks source link

Few issues while using Reflex to integrate to our project #152

Closed harshakm-50063 closed 1 year ago

harshakm-50063 commented 2 years ago

Hi,

We have been having few issues and hence some clarifications on using reflex for our projects.

  1. We have our codebase default runtime encoding as UTF-32., and while using any lexing , we do not want the overhead of converting to UTF8 and use Reflex - The document, it seems that the UTF-8 is the default internal runtime encoding for the Reflex and even if we are passing a UTF-32, you would convert it to UTF-8 internally (with copy overhead?) Please confirm this.
  2. For input passing as buffer, you support a wchar_t or std::wstring. which are implementation specific of being a UTF-16 OR UTF-32. We are using an implementation independent char32_t . Please let us know how to use that.
  3. When moved our current Flex based lexer file to Reflex, we also need a 'header file' generated, as we do want to write our own class which inherits from the generated lexer class, and implements various 'reduction code'. However, with --header-file option, the Reflex generates the class 'definition' in both 'header' as well as cpp. this causes 'redefinition of class' error. Please help us with this.
  4. The Reflex generated files (cpp/hpp) generates certain preprocessor macros (Version and option), since we do not use them in our code, we get a 'unused macros' warning. there is no way to disable that warning (for this specific lexer generated files) - we do not want to disable this warning globally at our project level. Even the %top{}% code that we specify gets generated after these macros and hence disabling warnings here does not effect.
  5. When we enable flex compatibility mode, a typedef for yyscanner is generated as 'this'. however, the same generated code further has some 'code' generated, where the identifier 'yyscanner' is used as a function parameter and this leads to a compilation error. How to crossover this?

Regards, Harsha Kodnad Tally Solutions Pvt. Ltd.

genivia-inc commented 1 year ago

I was out of town, so thank you for your patience.

  1. yes, input is normalized to UTF-8 which makes it simpler to emulate Flex on Unicode streams e.g. yytext is always an 8 bit string.
  2. feed char32_t* data into a std::stringstream perhaps? There are ways to define a stream that can be used with reflex::Input see e.g. reflex/include/reflex/input.h for examples.
  3. ???
  4. unused macros are not errors, why would they be?
  5. this is not really possible, because. there is no yyscanner generated. There is a #define yyscanner this in flexlexer.h to enable reflex to be used with legacy Flex and lex code. But if the compiled legacy code has yyscanner variables or arguments then those need to be changed.
harshakodnad commented 1 year ago

Thank you for your response.

Since I posted this issue, we have moved away from Flex compatibility based options to directly using reflex. Few issues have got solved due to that. Regarding point (3), the reflex still generates the complete class declaration in header as well as cpp file. however, cpp file has implementation as well. This we have been able to overcome using some header order, where we ensure that the generated .cpp never includes the generated header (While generated header is used by our other code). Regarding point (4), unused macros are NOT errors. they are produced as warnings. But we have compiler setting with -Wall where all warnings are produced and we have a stringent standard to have a clean compilation with zero warnings. Again, currently, by doing a dummy usage of these macros in .l file (Which gets generated, we have been able to resolve this issue.

All other points have got solved due to non-flex compatibility mode we have moved to.

Thanks again for your response.

Regards, Harsha Kodnad