CWG2870 [lex.string] `"" ""` (adjacent ordinary string literals) are ill-formed

Eisenwave commented 7 months ago

Reference (section label): [lex.string]

Issue description

Subclause 5.13.5 [lex.string] paragraph 7 states that:

If two string-literals have the same encoding-prefix, the common encoding-prefix is that encoding-prefix. If one string-literal has no encoding-prefix, the common encoding-prefix is that of the other string-literal. Any other combinations are ill-formed.

In the case "" "", i.e. when neither string-literal has an encoding-prefix:

The first sentence in the quote cannot apply because neither string-literal has an encoding-prefix, and encoding-prefix cannot be empty.
The second sentence in the quote cannot apply because the other string-literal has no encoding-prefix.

Therefore, this construct is ill-formed. Arguably, it is not possible to "fill in the blanks" and interpret the latter sentence as:

If at least one string-literal has no encoding-prefix the common encoding-prefix is that of the other string-literal, or none if neither has an encoding-prefix.

On another note, it is unusual that we talk about a "common encoding-prefix", even in the case where there is no encoding-prefix at all. The common prefix in this paragraph should not be formatted as a grammar rule.

Suggested resolution

Itemize subclause 5.13.5 [lex.string] paragraph 7, and update the result as follows:

The common ~~encoding-prefix~~ encoding prefix for a sequence of adjacent string-literals is determined pairwise as follows:

If two string-literals have the same encoding-prefix, the common ~~encoding-prefix~~ encoding prefix is that encoding-prefix.

If Otherwise, if one string-literal has no encoding-prefix, the common ~~encoding-prefix~~ encoding prefix is that of the other string-literal.

Otherwise, if neither string-literal has an encoding-prefix, there is no common encoding prefix.

~~Any other combinations are~~ Otherwise, the program is ill-formed.

Alternative resolution (not proposed, but worth considering)

In subclause 5.13.5 [lex.string] paragraph 7, replace all occurrences of encoding-prefix with "encoding prefix". This legitimizes applying paragraph 7, sentence 1 or 2 to the case "" "".

Eisenwave commented 7 months ago

It's been alleged that "" "" is valid because these string literals have a "none" prefix ([tab:lex.string.literal]), so the aforementioned sentence in paragraph 7 would apply here.

However, the wording specifically mentions encoding-prefix, not "encoding prefix", and the grammar rule never produces the empty word.

frederick-vs-ja commented 6 months ago

Perhaps what we want to say is

~Any other combinations are ill-formed.~ Otherwise, there is no common encoding-prefix and both string-literals shall have no encoding-prefix.

This looks somehow editorial to me...

Eisenwave commented 6 months ago

This looks somehow editorial to me...

You're suggesting to turn "the program is ill-formed" into "the program is well-formed with this behavior ..."; how is that editorial?

frederick-vs-ja commented 6 months ago

This looks somehow editorial to me...

You're suggesting to turn "the program is ill-formed" into "the program is well-formed with this behavior ..."; how is that editorial?

The major issue seems to be that the second sentence may be treated as

If one string-literal has no encoding-prefix and the other has one, the common encoding-prefix is that of the other string-literal.

But my reading is that such treatment isn't or at least shouldn't be valid. The whole precondition should be "one string-literal has no encoding-prefix", so concatenation of adjacent ordinary string literals falls into this case and thus is well-formed.

The issue I see is that it's unclear whether "the common encoding-prefix is that of the other string-literal" can imply that "the common encoding-prefix does not exist if the other string-literal has no encoding-prefix". I believe such implication is intended, but I'm not sure whether it's valid.

Eisenwave commented 6 months ago

I believe such implication is intended, but I'm not sure whether it's valid.

Well, yeah, that's the crux of the issue. I don't believe that such a reading is correct because saying "the encoding-prefix of the other literal" cannot be applied when the other has no encoding-prefix at all.

jensmaurer commented 6 months ago

CWG2870

cplusplus / CWG