goccmack / gocc

Parser / Scanner Generator
Other
622 stars 48 forks source link

backtracking for scanner? #7

Open awalterschulze opened 9 years ago

awalterschulze commented 9 years ago

Given three tokens

a : 'a'
asb : 'a' { 'a' } 'b'
c : 'c'

and the input string

aac

The generated lexer will return an illegal token. Instead of the expect three tokens [a,a,c]

Reading input char 0 'a' the lexer goes into an accept state for token a. Next it receives char 1 'a' so the lexer goes into a reject state with the hope of receiving a 'b' at some point. Instead the next char 2 'c' sends the automaton into a unrecoverable reject state, resulting in a reject token.

I realise that this will require gocc to implement a backtracking scanner, but the current implementation can result in some very unintuitive debugging for the user. Maybe something like this could be added to documentation if it is not there already?