I recently upgraded from FsLexYacc 10.0 to the latest 11.3.0. After the upgrade, parsing a comment line // ä now fails with "unrecognized input". I have made no changes to the lexer or parser options, nor to the parser or lexer definitions.
Repro steps
I have managed to create a small-ish reproducer:
Parser.fsy:
%token EOF
%token <string*FSharp.Text.Lexing.Position> IDENTIFIER
%start top
%type <string> top
%%
top: EOF { "hello" }
Lexer.fsl:
{
module Lexer
open FSharp.Text.Lexing
open Parser
let lexeme lexbuf = LexBuffer<char>.LexemeString lexbuf
}
let alpha = ['a' - 'z' 'A' - 'Z']
let swe = ['ä' 'Ä' 'ö' 'Ö' 'å' 'Å' ]
let letter = alpha | swe
let ident = letter+
let newline = ('\n' | "\r\n" )
rule token = parse
| "//" { commentline lexbuf.StartPos lexbuf }
| ident { IDENTIFIER(lexeme lexbuf, lexbuf.StartPos) }
| newline { token lexbuf }
| eof { EOF }
| _ { failwith "unknown token" }
and commentline p = parse
| newline { token lexbuf }
| eof { EOF }
| _ { commentline p lexbuf }
Program.fs:
open Parser
open Lexer
let input = "// ä"
let lexbuf = FSharp.Text.Lexing.LexBuffer<_>.FromString input
let result = Parser.top Lexer.token lexbuf
printfn "%s" result
When running the program above with dotnet run the output should be "hello".
Actual behavior
We get an exception with the stacktrace:
Unhandled exception. System.Exception: unrecognized input
at FSharp.Text.Lexing.LexBuffer`1.EndOfScan() in /home/runner/work/FsLexYacc/FsLexYacc/src/FsLexYacc.Runtime/Lexing.fs:line 128
at FSharp.Text.Lexing.UnicodeTables.scanUntilSentinel(LexBuffer`1 lexBuffer, Int32 state) in /home/runner/work/FsLexYacc/FsLexYacc/src/FsLexYacc.Runtime/Lexing.fs:line 448
at Lexer.commentline(Position p, LexBuffer`1 lexbuf) in C:\cygwin64\home\daab\dev\FsLexYaccRepro\Lexer.fs:line 81
at Lexer.token(LexBuffer`1 lexbuf) in C:\cygwin64\home\daab\dev\FsLexYaccRepro\Lexer.fs:line 18
at Program.result@6.Invoke(LexBuffer`1 lexbuf)
at FSharp.Text.Parsing.Implementation.interpret[tok,a](Tables`1 tables, FSharpFunc`2 lexer, LexBuffer`1 lexbuf, Int32 initialState) in /home/runner/work/FsLexYacc/FsLexYacc/src/FsLexYacc.Runtime/Parsing.fs:line 346
at FSharp.Text.Parsing.Tables`1.Interpret[char](FSharpFunc`2 lexer, LexBuffer`1 lexbuf, Int32 startState) in /home/runner/work/FsLexYacc/FsLexYacc/src/FsLexYacc.Runtime/Parsing.fs:line 498
at Parser.engine[a](FSharpFunc`2 lexer, LexBuffer`1 lexbuf, Int32 startState) in C:\cygwin64\home\daab\dev\FsLexYaccRepro\Parser.fs:line 111
at Parser.top[a](FSharpFunc`2 lexer, LexBuffer`1 lexbuf) in C:\cygwin64\home\daab\dev\FsLexYaccRepro\Parser.fs:line 113
at <StartupCode$FsLexYaccRepro>.$Program.main@() in C:\cygwin64\home\daab\dev\FsLexYaccRepro\Program.fs:line 6
Note that parsing the input "// a" works fine. Also, parsing works if I remove ä from swe in Lexer.fsl.
Bisection indicates that the regression was introduced with 48ec571 (break out core domain logic and generation into core libraries (#144), 2021-01-27).
Description
I recently upgraded from FsLexYacc 10.0 to the latest 11.3.0. After the upgrade, parsing a comment line
// ä
now fails with "unrecognized input". I have made no changes to the lexer or parser options, nor to the parser or lexer definitions.Repro steps
I have managed to create a small-ish reproducer:
Parser.fsy:
Lexer.fsl:
Program.fs:
FsLexYaccRepro.fsproj:
Expected behavior
When running the program above with
dotnet run
the output should be "hello".Actual behavior
We get an exception with the stacktrace:
Note that parsing the input "// a" works fine. Also, parsing works if I remove
ä
fromswe
in Lexer.fsl.