blynn / nex

Lexer for Go
http://cs.stanford.edu/~blynn/nex/
GNU General Public License v3.0
416 stars 47 forks source link

strange lexing behavior #33

Open databus23 opened 8 years ago

databus23 commented 8 years ago

I'm having a hard time understanding the behavior of the lexer in the following case:

/\(/   { fmt.Printf("-> %q\n", yylex.Text()) }
/\)/   { fmt.Printf("-> %q\n", yylex.Text()) }
/[^( ][^ ]*[^ )]/ { fmt.Printf("-> %q\n", yylex.Text()) }
//

package main
import ("fmt")
func main() {
  fmt.Printf("lexing %q:\n", "(rule)")
    NN_FUN(NewLexer(strings.NewReader("(rule)")))
  fmt.Printf("lexing %q:\n", "( rule  )")
    NN_FUN(NewLexer(strings.NewReader("( rule )")))
}

Output of nex -r -s huh.nex:

lexing "(rule)":
-> "("
-> "rule"
lexing "( rule  )":
-> "("
-> "rule"
-> ")"

Why is the lexer swallowing the trailing bracket when there is no space between the content and the surrounding brackets?. This looks like a bug to me. It has something to do with second character class in the content regex ([^ ]*), when I change that to also not match ) it works.

drewwells commented 8 years ago

https://github.com/blynn/nex/issues/16 mentions requiring a newline at the end of the file. This looks to apply here too. Adding even a space at the end matched the last bracket correctly "(rule) "

databus23 commented 8 years ago

@drewwells not for me:

lexing "(rule) ":
-> "("
-> "rule"