arr-ai / wbnf

ωBNF implementation
Apache License 2.0
7 stars 4 forks source link

Generated Go code removes spaces from regexes #59

Closed camscale closed 4 years ago

camscale commented 4 years ago

The wbnf gen command to generate Go code removes spaces from regexes that have been explicitly added with \_. All whitespace is stripped from regexes in wbnf to make regexes easier to understand by splitting up their parts. Most whitespace can be represented by backslash sequences, except space, for which wbnf has \_ as a special extension.

However after \_ has been replaced with a space after previously stripping whitespace, the code generator then strips all the spaces in cmd/codegen/grammar.go:safeString.

This results in a panic in the sysl grammar:

panic: regexp: Compile(`(?m)\A[]*(?:([^(\r?\n)]+))[]*`): error parsing regexp: unexpected ): `(?m)\A[]*(?:([^(\r?\n)]+))[]*`

The [] should have a space, and because the ] appears after [, it does not close the character class, but instead puts ] into the character class. This causes the mismatched parens because the character class does not close when it should.

The main function of safeString is to make the string a Go string that is syntactically valid. This can be done with fmt.Sprintf("%#v", s) which is likely safer against any mistakes (and clearer to Go programmers). If the ast has whitespace in it, we should leave it there. I can see no reason why the go generate should remove it.