LanguageDev / Yoakke

A collection of libraries for implementing compilers in .NET.
Apache License 2.0
141 stars 8 forks source link

Valid regex causes lexer to fail with YKLEXERGEN004 #82

Open skneko opened 3 years ago

skneko commented 3 years ago

Describe the bug When using the built-in regex Regexes.IeeeFloatLiteral in a lexer like this:

public enum TokenKind {
  [Regex(Regexes.IeeeFloatLiteral)] MyNumberLiteral,
}

the following error is emitted by the source generator:

error YKLEXERGEN004: Failed to parse regular expression: unexpected end of string

The regex contained in the aforementioned constant is as follows:

((([0-9]*.)|([+-]?[0-9]+[0-9_]*.))?[0-9]+[0-9_]*((e|E)[+-]?[0-9]+)?)|([+-]?infinity)|(NaN)

I've tested this regex in this site: http://regexr.com/63qtu, and it looks valid. It also seems to be valid according to minimalist flavor supported by Yoakke and described in the docs.

Which libraries does it affect? Yoakke.Parser.Generator 2021.8.16-2.23.12-nightly

To Reproduce Steps to reproduce the behavior:

  1. Create a lexer that uses [Regex(Regexes.IeeeFloatLiteral)] as shown above.
  2. Build the project.

Expected behavior The project should build normally and the regex should match floating point number literals as indicated in the valid regex.

Environment (please complete the following information):

LPeter1997 commented 3 years ago

Turns out, the bug is caused by [+-] Right now, the parser interprets this as a range from + to ]. We'll need to rework the regex parser a bit to consider these cases.

LPeter1997 commented 3 years ago

While implementing a Lua parser, the following regexes caused a parse error aswell (Lua comment & block comment):

[Regex(@"--([^\r\n\[][^\r\n]*)?")]
[Regex(@"--\[([^\]]|(\][^\-])|(\]-[^\-]))*\]--")]