CWG2901 [basic.lval] Accessing an object through a different type through which it is type-accessible has undefined semantics

Eisenwave commented 3 months ago

Reference (section label): [basic.lval]

Issue description

int x = -1;
auto y = *reinterpret_cast<unsigned*>(&x);   // read value through different type
*reinterpret_cast<unsigned*>(&x) = UINT_MAX; // modify value through different type

An object of type int is type-accessible through a glvalue of type unsigned.

However, [conv.lval] p3.4 states:

Otherwise, the object indicated by the glvalue is read ([defns.access]), and the value contained in the object is the prvalue result.

We obviously require a prvalue of unsigned type for the initialization of y, and the -1 value contained in the int object has the wrong type and and cannot be represented as unsigned.

Similarly, [expr.ass] p2 states:

In simple assignment (=), the object referred to by the left operand is modified ([defns.access]) by replacing its value with the result of the right operand.

This cannot be done because it would require storing a UINT_MAX value of unsigned type in an object of type int.

Suggested resolution

In [basic.lval] p11, add the following sentence:

When an object of dynamic type T_obj is accessed through a glvalue of type T_ref, T_obj is type-accessible through T_ref, and T_ref is not the same type as T_obj, an unspecified conversion from the stored value to T_ref takes place when the object's value is read, and an unspecified conversion from the to-be-stored value to T_obj takes place when the object is modified.

Eisenwave commented 3 months ago

FWIW this feels like yet another big fish and could possibly be blocked by https://github.com/cplusplus/papers/issues/592

It's also worth noting that the authors of P1839 disagree on whether the following (related) example is well-defined:

void increment_first_byte(int* n) {
    auto* a = reinterpret_cast<char*>(n);
    ++(*a);
}

If we had clearly defined semantics for when an object is modified through a different type, I'm inclined to agree with Krystian that this is okay.

However, as things stand, I don't believe we have any semantics for this case, and that's a defect that extends beyond the scope of P1835 because it's not about object representations in the case of signed/unsigned.

t3nsor commented 3 months ago

Surely you meant P1839 (there doesn't seem to be a published P1835).

The language lawyer interpretation of the read case is that you get the stored value of the object, which, in this case, is 0. If the value is negative, then http://eel.is/c++draft/expr.pre#4 applies, and it's UB. I think this is wrong, and we ought to say that you get the value with the same value representation. I don't understand the rationale for "unspecified conversion". To me, that's the same as saying you get an unspecified value in all cases.

In the write case, current wording similarly appears to imply that, if the value being written is also a valid value of the actual type of the existing object, then that value is stored. If the value being written is not a valid value of that type, then it's UB by omission. Again, I think we ought to say that you just get the value with the same value representation.

Your interpretation "0u is not a valid value for int" contradicts current wording such as http://eel.is/c++draft/basic.fundamental#6 "Further, each value has the same representation in both types" which implies that two objects of different type can actually have a common value.

Eisenwave commented 3 months ago

Again, I think we ought to say that you just get the value with the same value representation.

I'm tiptoeing around such a solution because while it makes sense for signed/unsigned integers, it needs to work for accessing int through char glvalues as well, and those don't necessarily have the same object representation.

I'm just getting us from UB by omission to unspecified. Not sure with what wording this could really be fixed up, if any. Perhaps this needs a paper.

Your interpretation "0u is not a valid value for int" contradicts current wording such as http://eel.is/c++draft/basic.fundamental#6 "Further, each value has the same representation in both types" which implies that two objects of different type can actually have a common value.

Hmm yeah, fair point. I think the current wording doesn't strictly attach a type to a value for fundamental types, but it's more mathematical. 0 is literally the same value no matter which integral type or floating-point type stores zero. Negative numbers are a good example of where it breaks though; I'll update the issue.

Eisenwave commented 3 months ago

There's also a pretty good chance that this is NAD, everything is working as intended, and you are not permitted to type pun int as unsigned except for where the value is the same.

However, if so, we should have some examples that demonstrate that, such as:

int p = 0;
auto up = *reinterpret_cast<unsigned*>(&p);   // OK, up has value 0

int n = -1;
auto un = *reinterpret_cast<unsigned*>(&n);   // undefined behavior

jensmaurer commented 3 months ago

The "char, unsigned char, std::byte" case in [basic.lval] p11 is the object representation case. Let's not try to fix this here, but defer to P1839.

For the signed/unsigned case, I think that's clearly an (isolated) hole in the specification: We say access is allowed, but we don't say what the result is. For sure after we moved to a mandatory two's complement representation, it seems abundantly clear that you get the congruent value per [basic.fundamental] p3 and p5. Nothing unspecified here.

Wording suggestions welcome.

frederick-vs-ja commented 3 months ago

I guess the following resolution may be wanted. Given that the static and dynamic types of a scalar object is always the same, perhaps we shouldn't say "dynamic type" in [conv.lval] p3.4 and [expr.ass] p2.

Let's not try to fix this here, but defer to P1839.

I guess this is OK, but it's somehow weird to me to just skip the object representation parts.

Suggested resolution

Modify [conv.lval] as indicated

(3.4) - Otherwise, let U and V be the cv-unqualified version of T and the type of the object, repectively. ~t~The object indicated by the glvalue is read ([defns.access]), and the prvalue result is

the value contained in the object ~is the prvalue result~ if U is same as V,

otherwise, the value whose value representation ([basic.types.general]) is same as the value contained in the object if U is a signed or unsigned version of V,

otherwise, the value of the first element of the object representation of the object.

If the result is an erroneous value ([basic.indet]) and the bits in the value representation are not valid for the object's type, the behavior is undefined.

Modify [expr.ass] as indicated

In simple assignment (=), let U and V be the cv-unqualified version of the type of the left operand and the object referred to by it, repectively, and let x be the result of the right operand. ~t~The object referred to by the left operand is modified ([defns.access]). ~by replacing its value with the result of the right operand.~ If U is same as V, the value of the object is replaced with x. Otherwise, if U is a signed or unsigned version of V, the value of the object is replaced with the V value whose value representation ([basic.types.general]) is same as x. Otherwise, the first element of the object representation of the object is replaced with x.

Extra-Creativity commented 3 months ago

The "char, unsigned char, std::byte" case in [basic.lval] p11 is the object representation case. Let's not try to fix this here, but defer to P1839.

It seems that P1839 has completely stopped though previously I think it's already accepted as DR in C++23. See https://github.com/cplusplus/papers/issues/592. So what happened?

t3nsor commented 3 months ago

The "char, unsigned char, std::byte" case in [basic.lval] p11 is the object representation case. Let's not try to fix this here, but defer to P1839.

It seems that P1839 has completely stopped though previously I think it's already accepted as DR in C++23. See cplusplus/papers#592. So what happened?

This is not really the right forum to discuss this, but basically the paper was reviewed by CWG, and certain issues were found that have no known resolution, which are described in the last section of the paper.

jensmaurer commented 3 months ago

CWG2901

cplusplus / CWG