Open CeylonMigrationBot opened 9 years ago
[@lucaswerkmeister] Node, Firefox, and Chromium all reject both the identifiers 𐐨
(the character) and \uD801\uDC28
(its UTF-16 representation). I’m not sure if this is in compliance with the ECMAScript specification. 7.6 “Identifier Names and Identifiers” of ECMA-262 references the “Identifiers” section of chapter 5 of the Unicode standard (5.15 Identifiers, page 227), which itself points to Annex #2462, “Unicode Identifier and Pattern Syntax”. That document does not mention planes or encodings, but does speak of “code points”, which in the Glossary are defined to mean “Any value […] from 0 to 10FFFF16”. However, ECMA-262 7.6 speaks of “characters”, not “Unicode characters”, which per section 6 means “a single 16-bit unit of text”. Furthermore, the section explicitly mentions Unicode 3.0 as the reference version of the standard, and it appears that while the concept of multiple Unicode planes was introduced in Unicode 2.0, blocks outside the Basic Multilingual Plane were only added starting with version 3.1 of the standard; therefore, even if an implementation could read Unicode characters that span multiple code units, it would not be required to know whether any of these characters have a Letter category, and could still reject them in an identifier.
The upshot of all this is, looks like you need to encode these names with something like $u$10428
.
[@chochos] Hm, that $u$10428
doesn't look all that bad...
[@lucaswerkmeister] Well that’s only a single-character example. It’ll probably have to be uglier for multiple characters in order to be unambiguous between “U+1234 U+56789“ and “U+12345 U+6789”. Perhaps $u$1234$56789
.
[@chochos] oh of course it will be ugly. Actually maybe $u123456$u56789
would be better. Or the hex value to make it shorter.
[@lucaswerkmeister] Those were supposed to be hex values. (Probably shouldn’t have used values above 10FFFF
.)
And I’m not familiar with the other escapings the JS runtime has… if $u
is unambiguous, sure, that actually looks pretty nice.
[@chochos] So there's an initial implementation, but I'm sure it requires some more thorough testing: both toplevel and nested values, types, functions, parameters, aliases, etc.
[@lucaswerkmeister]
(that’s U+10428 DESERET SMALL LETTER LONG I)
The compiler passes it through without problems, but node.js can’t handle it:
Found after a suggestion by @tombentley in #2067, though the issue is much simpler here. (When this bug is fixed, the JS model loader should be tested against #2067 as well.)
[Migrated from ceylon/ceylon-js#510]