Genivia / RE-flex

A high-performance C++ regex library and lexical analyzer generator with Unicode support. Extends Flex++ with Unicode support, indent/dedent anchors, lazy quantifiers, functions for lex and syntax error reporting and more. Seamlessly integrates with Bison and other parsers.
https://www.genivia.com/doc/reflex/html
BSD 3-Clause "New" or "Revised" License
522 stars 85 forks source link

Desirement: adding a parameter to the 'new_matcher ()' method to set the starting column #124

Closed teshields closed 2 years ago

teshields commented 2 years ago

I'm evaluating converting my application's scanner from flex to reflex.

My application currently includes an embedded macro processing mechanism, across multiple input files. Macro expansion is done by creating a new struct yy_buffer_state value (pushing the current buffer state value) with the macro body as the buffer when the macro name is recognized; macro formal parameter replacement is done by creating a new buffer state value (pushing the macro body buffer state value) with the argument text as the buffer when the corresponding parameter name is recognized in the macro body.

Both a macro body and a macro argument are stored with the source file name, starting line & column; the starting column is not normally column 1.

During a macro expansion, an error message currently includes the file name and line relative to the macro body's original definition, or the same for the argument's location. I would like to expand the error message location to include the (start & end) column.

In flex, since I implement my own column counting mechanism, I can implement the error message column location enhancement by simply adding a column member to my existing customized struct yy_buffer_state, saving my column counter, and then setting the starting column associated with the new buffer state.

While reflex does the column counting, and apparently handles less () back up across an EOL character (which I accomplish by replacing the YY_LESS_LINENO () macro), there does not appear to be a mechanism for setting the starting column for a new matcher, unless I've missed something.

I have not as yet investigated the details of extending the current new_matcher () method to add a columno parameter to set the starting column. Simply adding a column offset member to AbstractMatcher isn't sufficient, since it will only be applicable up to the first EOL character of the new input source. I'd have to at least also track whether that first EOL character has been matched, and whether a less () call backed up over that first EOL character.

Do the reflex author(s) have any suggestions?

genivia-inc commented 2 years ago

Interesting use case. Resetting the column counter is easy, because the RE/flex abstract matcher computes columns on demand, i.e. the column offset is computed when columno() is invoked. A cashing mechanism speeds this up so that there is no need to recompute the column number. Only the columns after the last columno() invocation are computed. Therefore, adding a method to reset the cached column number is almost trivial. The cpb_ points to the last position in the buffer when columns were counted and cno_ is the current column count. Setting cpb_ = bol_ and cno_ = 0 resets the column counting.

teshields commented 2 years ago

Thanks for the pointer.

Without having yet delved into the implementation details, it seems that there is still a bit of complexity in the case of a multi-line macro (or argument).

The starting column offset is not applicable to text in the macro body (or argument) buffer after the first EOL character, but needs to be maintained in case the less () method backs up over that first EOL character. In my application, backing up over multiple EOL characters will not occur, but however unlikely, it is at least possible that some future application might do so.

So, it seems to me that the matcher will need to maintain the first EOL character location in the underlying buffer, if one if exists.

I will try some modifications to the abstract matcher source in the next week or so and report back, hopefully with a proposed patch.

genivia-inc commented 2 years ago

Are you looking for something like this?

  /// Set or change the starting column number of the last match.
  inline void columno(size_t n) ///< new column number

This will reset the column number to n so that subsequent matching will increase the column count from n.

teshields commented 2 years ago

That would work, provided the context of the column ‘change’ was maintained after a subsequent EOL occurrence recovered in case of ‘less(n)’ backing up across that EOL occurrence.

Sent from Tom’s iDevice

On Apr 7, 2022, at 2:38 PM, Dr. Robert van Engelen @.***> wrote:

 Are you looking for something like this?

/// Set or change the starting column number of the last match. inline void columno(size_t n) ///< new column number This will reset the column number to n so that subsequent matching will increase the column count from n.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

genivia-inc commented 2 years ago

Yes, that should work because columno(n) sets the column number of the position of the matching text, not after the text. Or if no text was matched, then it is the start of the next input to scan.

genivia-inc commented 2 years ago

The latest release has the new columno(n) method to set the column number of the last match.