Genivia / RE-flex

A high-performance C++ regex library and lexical analyzer generator with Unicode support. Extends Flex++ with Unicode support, indent/dedent anchors, lazy quantifiers, functions for lex and syntax error reporting and more. Seamlessly integrates with Bison and other parsers.
https://www.genivia.com/doc/reflex/html
BSD 3-Clause "New" or "Revised" License
529 stars 86 forks source link

Fixed handling of negatives in UCS4 to UTF8 conversion. #169

Closed SouravKB closed 1 year ago

SouravKB commented 1 year ago

Converting a negative integer (accidentally) using int utf8(int, char*) defined in utf8.h was causing it to silently fill a value between 0-255. This can cause wunput(EOF) and skip(EOF) to return successfully, but actually do something differently than what was intended (fails silently).

I have changed it to fill with REFLEX_NONCHAR instead since that was what the code was doing for out-of-range codepoints.

You may even decide to throw instead.

genivia-inc commented 1 year ago

wunput(EOF) and skip(EOF) are not valid, since EOF is not a character! Rather, it is a marker returned by an IO operation as a special case.