canonical / sqlair

Friendly type mapping for SQL databases
Apache License 2.0
17 stars 8 forks source link

Allow Unicode struct (field) names and column names #84

Closed Aflynn50 closed 9 months ago

Aflynn50 commented 1 year ago

Change the parser to cycle through the input by rune rather than by byte. Go allows certain Unicode characters to be used in struct names and SQLite allows them to be used in column names, therefore we should allow them in these places too.

Also update the column and struct tag parser to accept column names in quotes, since this is also allowed in SQLite.

The logic of the parser has been kept unchanged when possible.

benhoyt commented 1 year ago

This looks very reasonable to me. At first I wondered why the lexer/parser itself needed to understand Unicode, but yeah, I don't see any way around it given that we're skipping/parsing Go struct names and the like.

Interestingly, ages ago I switched GoAWK's lexer the other way for performance reasons, from rune to []byte, as the AWK grammar doesn't support Unicode identifiers (you can still use Unicode/UTF-8 fine in string literals and comments though, but there it's just skipping over the bytes till it reaches " or end of line.) See here and here.

Nit: "Compatablilty" is spelt wrong in the PR title. Might be better to have a more explicit title in any case, maybe "Allow Unicode struct (field) names and column names".