KhronosGroup / SPIRV-Headers

SPIRV-Headers
Other
269 stars 254 forks source link

[Specification] Obscure string literal word layout #328

Open alexcher-im opened 1 year ago

alexcher-im commented 1 year ago

Current specification (Version 1.6, Revision 2) (see Online specification) states:

2.2.1 Instructions:

Literal: ... A string is interpreted as a nul-terminated stream of characters. All string comparisons are case sensitive. The character set is Unicode in the UTF-8 encoding scheme. The UTF-8 octets (8-bit bytes) are packed four per word, following the little-endian convention (i.e., the first octet is in the lowest-order 8 bits of the word). The final word contains the string’s nul-termination character (0), and all contents past the end of the string in the final word are padded with 0.

Which can be misundestood as the nul-word is required after the words, containing string. e.g. the string "Khronos" wil consume 3 words: "Khro", "nos\0", "\0", while the last word must not be present, according to the generated output of SPIRV-Tools.

Can you please clarify it in the specification, maybe with simple example, like: "the 6-character string will consume 2 words, where high 16 bits of the last word will be zero", or something more formal.

gnl21 commented 1 year ago

How about something like:

The final word is the word containing the string's nul-termination byte. Any remaining bytes in this final word are padded with 0.

Hopefully that makes it clearer that the nul-terminator is a byte in the string rather than requiring an additional nul word.