Open MichaelChirico opened 4 years ago
The root cause of this is that the "\u65\u301" variant is actually two characters, an "e" and a diacritic "´" (nchar() returns 2). The diacritic is not an alphanumeric character, hence the test in isValidName() (in gram.y) fails.
isValidName() uses iswalnum() & friends, but the decomposed é is not a wide character in that sense, so it checks first the "e" (=="\u65"), and then the "\u301" (accent aigu diacritic).
I don't think this is fixable unless we insert code to explicitly change the normalization, and I am not sure we'd want to do that at the parser level.
A workaround is to normalize in user space (package utf8 has code for that).
Check the above code snippet.
\u65\u301
and\ue9
are the same character ofé
in different normalization form. However, parse() only honor the NFC form.The NFD form is fine inside the string quote though.
METADATA