blynn / nex

Lexer for Go
http://cs.stanford.edu/~blynn/nex/
GNU General Public License v3.0
416 stars 47 forks source link

Inconsistent behaviour of nested regular expressions #9

Closed md2 closed 11 years ago

md2 commented 11 years ago

The following code

/[a-z]+: [a-z]+/ <
  { fmt.Println("BEGIN"); }
  /[a-z]+:/ {
    fmt.Println(1, yylex.Text())
  }
  /[a-z]+/ {
    fmt.Println(2, yylex.Text())
  }
> { fmt.Println("END"); }
//
package main
import ("fmt";"os")
func main() {
  NN_FUN(NewLexer(os.Stdin))
}

(when processed with nex and executed as echo name: value | ./testcase.nn) prints

BEGIN
1 name:
2 value
END

...as you would expect. However, if you'll change the second nested expression to /.+/ --

/[a-z]+: [a-z]+/ <
  { fmt.Println("BEGIN"); }
  /[a-z]+:/ {
    fmt.Println(1, yylex.Text())
  }
  /.+/ {
    fmt.Println(2, yylex.Text())
  }
> { fmt.Println("END"); }
//
package main
import ("fmt";"os")
func main() {
  NN_FUN(NewLexer(os.Stdin))
}

it will print only

BEGIN
2 name: value
END

That is, in the last case the first nested expression is never matched.

blynn commented 11 years ago

Actually, this is working as intended. Quoting from http://dinosaur.compilertools.net/lex/

1) The longest match is preferred. 2) Among rules which matched the same number of characters, the rule given first is preferred.

In your second example, the second pattern gives a longer match than the first pattern, so it is preferred.