Open stevengj opened 5 months ago
I think this is a bug in the parser. What would the printing be changed to to make it parse? Just using u
doesn't work because then the literal is too large:
julia> '\u11000'
ERROR: ParseError:
# Error @ REPL[27]:1:2
'\u11000'
#└─────┘ ── character literal contains multiple characters
Stacktrace:
[1] top-level scope
@ REPL:1
The printing could be changed to '\xf4\x90\x80\x80'
, by calling Base.show_invalid
, for example. ('\U110000'
is a lot more understandable, but is meaningless from the perspective of Unicode.)
It could also print as Char(0x110000)
, but that's a pretty radical change from how other characters are printed.
If we extend the parser to allow this, I guess we would parse up to '\U1fffff'
, since Char(0x200000)
throws an error. That seems reasonable to me, since there is still a clear upper bound on what we should parse.
The manual has that exact value as an example, and documents that up to the following 8 bytes are allowed for \U
, so I'd be in favor of fixing the parser.
Meta.parse(repr(Char(0x110000)))
fails becausebut
'\U110000'
is not parseable:isvalid(Char(0x110000))
is false, but other invalid characters are parsed okay:so this seems kind of inconsistent.
Options are either (a) change the printing of
Char(0x110000)
or (b) change the parsing to allow this. I lean towards (a). Thoughts?