Open xmh0511 opened 3 years ago
I think [basic.fundamental] p7 and p8 try to establish the relationship between the type and code unit, but this could certainly be clearer.
I think [basic.fundamental] p7 and p8 try to establish the relationship between the type and code unit, but this could certainly be clearer.
Although p7 states
The values of type char can represent distinct codes for all members of the implementation's basic character set.
However, here is unclear that whether the wording "implementation's basic character set" refers to "basic source character set " or "basic execution character set". Presumably, it refers to the latter. But, as stated in [lex.charset#3]. Execution character set is a superset of a basic execution character set
.
Take Execution character set
as set S and take basic execution character set
as set A where A⊆S
As the lex.ccon#tab:lex.ccon.literal indicates, we don't know whether an element in the absolute complement set(∁UA) of basic execution character set can be encoded in a char object. After all, the standard does not specify how to encode an execution character set except that it specifies the value 0 for the null character.
This is being addressed by P2314 Character sets and encodings cplusplus/papers#998.
This seems covered by CWG2779.
As the special rules specified in [lex.ccon]#1, that is:
The Unicode standard specifies how large a code unit for UTF8, UTF16, and UTF32 respectively. Which has a similar meaning as stated in wiki Character_encoding. However, it does not state how large the
code unit
for the encoding of the execution (wide-)character set. So, in this case, how to determine whether a code point value for a character in an ordinary or wide character literal can be encoded as a single code unit for the corresponding kind character literal?Is it a good idea to change the wording "cannot be encoded as a single code unit" to "cannot be represented by an object with the type of the corresponding kind character-literal"?