Open HertzDevil opened 3 years ago
https://github.com/crystal-lang/crystal/pull/11508#discussion_r766923604 suggests to remove the Number Letter category from <Start>
. I think that's fine.
We can always reconsider and loosen this restriction if there are some legitimate use cases for identifiers starting with number letters.
Do we really want "most" emojis as identifiers? Ones that contain ZWJs won't be allowed anyway, unless the lexer operates on grapheme clusters and the compiler / standard library supports those sequences. I wouldn't miss being able to name my variables like 😂 and 🫱🏿🫲🏾.
We could simply bring #11508 back but use the new parser warning facilities instead.
Any chracter whose Unicode codepoint is 0xA0 or above is allowed in identifier names (see https://github.com/crystal-lang/crystal/pull/10978#discussion_r674068256). This means the following is valid code:
The above, in turn, is produced by the following Crystal code:
That is, the variable's name is just the zero-width space character. To exclude those edge cases I suggest that we implement the Default Identifiers from UAX #31 (revision 35) with the following profile:
The above sets, when restricted to ASCII, are consistent with Crystal's existing lexer rules. Type and constant names must still begin with an ASCII uppercase letter.
For reference, this is also how C++23 does it (with the additional requirement that all identifiers must be in NFC).