Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Invalid codegen when comparing pointer to one past the end and then dereferencing that pointer #51537

Open Quuxplusone opened 2 years ago

Quuxplusone commented 2 years ago
Bugzilla Link PR52570
Status NEW
Importance P enhancement
Reported by Gabriel Ravier (gabravier@gmail.com)
Reported on 2021-11-20 19:02:17 -0800
Last modified on 2021-11-22 13:22:49 -0800
Version trunk
Hardware PC Linux
CC blitzrakete@gmail.com, dblaikie@gmail.com, dgregor@apple.com, erik.pilkington@gmail.com, jason.e.cobb@gmail.com, johelegp@gmail.com, llvm-bugs@lists.llvm.org, momchil.velikov@arm.com, richard-llvm@metafoo.co.uk
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also

extern int x[1], y;

int f(int p, int q) { q = y; if (p == (x + 1)) { p = 2; return y; } return 0; }

LLVM trunk currently outputs the following code with -O3 -mtune=znver3 (without it it outputs correct code but I'm pretty sure it still makes the same wrong assumption about the value of p):

f: mov eax, dword ptr [rip + y] mov dword ptr [rsi], eax xor eax, eax cmp rdi, offset x+4 je .LBB0_1 ret .LBB0_1: mov eax, dword ptr [rip + y] mov dword ptr [rip + x+4], 2 ret

Which is incorrect because p could point to y, for example if f was called as such:

int whatever; f(&y, &whatever);

and y could happen to be located in memory right after x.

Also, although the comparison's result is unspecified, this still means only two results are possible according to the standard:

LLVM's optimization makes it so the result can also be the previous value of y, which could be something else than 0 or 2.

It seems that LLVM assumes that because p == (x + 1) it can replace all occurences of p with x + 1 without any regard to provenance, and doing that change manually would indeed mean the return y; could be optimized to use the previous store (and the store to x + 1 would be UB, too...), but this isn't the case here: p could simultaneously validly point to y and be equal to x + 1.

Godbolt link: https://godbolt.org/z/v73ME48qd

Quuxplusone commented 2 years ago
I believe this is actually /undefined/ behavior, rather than just unspecified.
For instance https://en.cppreference.com/w/c/language/operator_comparison lists
a series of comparisons that are allowed, and the rest are undefined. The
allowed list includes:
"if one pointer points to the element of an array and the other pointer points
one past the end of the same array, the one-past-the-end pointer compares
greater"
and
"if two pointers point to the same object, or both point one past the end of
the same array, they compare equal"

But there's no other provision for comparison of one-past-the-end pointers.
Quuxplusone commented 2 years ago
There is an explicit provision if you read the standard itself:

> If one pointer represents the address of a complete object, and another
pointer represents the address one past the last element of a different
complete object, the result of the comparison is unspecified.
- [expr.eq] (https://eel.is/c++draft/expr.eq#3.1)

Also, I believe the article you are citing is incorrect about relational
comparisons of pointers to different objects, the standard only indicates that
"neither pointer is required to compare greater than the other" ([expr.rel]
https://eel.is/c++draft/expr.rel#4.3) if they don't point to the same object,
which implies an unspecified result, not undefined behavior.
Quuxplusone commented 2 years ago

The webpage cited in Comment 1 is about C, not C++.

Quuxplusone commented 2 years ago
Ah, well, here's a quote from the C standard, then:

"Two pointers compare equal if and only if [...], or one is a pointer to one
past the end of one array object and the other is a pointer to the start of a
different array object that happens to immediately follow the first array
object in the address space."
- C Standard, 6.5.9. Equality operators

If you're wondering about its usage of "array" when referring to the non-past-
end-of-array pointer, here's another quote that should clarify that:

"For the purposes of these operators, a pointer to an object that is not an
element of an array behaves the same as a pointer to the first element of an
array of length one with the type of the object as its element type."
- C Standard, 6.5.9. Equality operators

It does look like for all the other relational operators, it is indeed true
that "the behavior is undefined" (6.5.8. Relational operators) when used on
pointers to different objects, although this should not be relevant to the test
case from the description as it uses == and not any of the affected operators
(<, >, <= and >=)

Also note that Clang's behavior on this does not change if the testcase is
compiled as C++.
Quuxplusone commented 2 years ago
(In reply to Gabriel Ravier from comment #2)
> There is an explicit provision if you read the standard itself:
>
> > If one pointer represents the address of a complete object, and another
pointer represents the address one past the last element of a different
complete object, the result of the comparison is unspecified.
> - [expr.eq] (https://eel.is/c++draft/expr.eq#3.1)
>
> Also, I believe the article you are citing is incorrect about relational
> comparisons of pointers to different objects, the standard only indicates
> that "neither pointer is required to compare greater than the other"
> ([expr.rel] https://eel.is/c++draft/expr.rel#4.3) if they don't point to the
> same object, which implies an unspecified result, not undefined behavior.

This wording seems to only apply to "The result of comparing unequal pointers
to objects" - pointers to one-past-the-end are not pointers to objects, so I
think that's where the "undefined behavior" arises from. The spec doesn't say
what should happen with such a comparison, so it's undefined.

A note in 6.8.2 [basic.compound] also mentions "A pointer past the end of an
object (7.6.6) is not considered to point to an unrelated object of the
object’s type that might be located at that address."(In reply to Jason Cobb
from comment #3)

> The webpage cited in Comment 1 is about C, not C++.

Fair enough - I Could totally believe this is defined in C. We might already
have a bug to dup against for that case.

Some other related, possibly duplicate bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502
https://bugs.llvm.org/show_bug.cgi?id=34548#c77
https://bugs.llvm.org/show_bug.cgi?id=46380
Quuxplusone commented 2 years ago

In the LLVM IR the store of 2 is done using a pointer value, whose underlying object is x.

IMHO, that is incorrect.

p == x + 1 evaluating to true could mean p is a non-dereferencable pointer into x or p points to an unrelated object, that happens to be allocated adjacent to x, but it does not follow that p does not point into y.

define dso_local i32 @Z1fPiS(i32 readnone %0, i32 nocapture %1) local_unnamed_addr #0 { %3 = load i32, i32 @y, align 4, !tbaa !3 store i32 %3, i32 %1, align 4, !tbaa !3 %4 = icmp eq i32 %0, getelementptr inbounds ([1 x i32], [1 x i32] @x, i64 1, i64 0) br i1 %4, label %5, label %7

5: ; preds = %2 store i32 2, i32 getelementptr inbounds ([1 x i32], [1 x i32] @x, i64 1, i64 0), align 4, !tbaa !3 %6 = load i32, i32* @y, align 4, !tbaa !3 br label %7

7: ; preds = %2, %5 %8 = phi i32 [ %6, %5 ], [ 0, %2 ] ret i32 %8 }

Quuxplusone commented 2 years ago

(In reply to David Blaikie from comment #5)

(In reply to Gabriel Ravier from comment #2)

There is an explicit provision if you read the standard itself:

If one pointer represents the address of a complete object, and another pointer represents the address one past the last element of a different complete object, the result of the comparison is unspecified.

Also, I believe the article you are citing is incorrect about relational comparisons of pointers to different objects, the standard only indicates that "neither pointer is required to compare greater than the other" ([expr.rel] https://eel.is/c++draft/expr.rel#4.3) if they don't point to the same object, which implies an unspecified result, not undefined behavior.

This wording seems to only apply to "The result of comparing unequal pointers to objects" - pointers to one-past-the-end are not pointers to objects, so I think that's where the "undefined behavior" arises from. The spec doesn't say what should happen with such a comparison, so it's undefined. No, [expr.eq] which I just cited explicitly says that comparison's result is unspecified, and as I've detailed in the description, this does not give LLVM license to do the transformation it does, as it has no way of ensuring &y == (x + 1) will always return false.

With regards to other relational operators, the standard also has a footnote that is linked to in the same sentence in which "The result of comparing unequal pointers to objects" is found: "A pointer past the last element of an array of n elements is considered to be equivalent to a pointer to a hypothetical array element n for this purpose."

A note in 6.8.2 [basic.compound] also mentions "A pointer past the end of an object (7.6.6) is not considered to point to an unrelated object of the object’s type that might be located at that address."(In reply to Jason Cobb from comment #3) I only access the other object using p, which is a pointer that actually was derived from that object (where f is called with &y). If the comparison, by itself, made it so the compiler can assume p was derived from x, then that would imply that evaluating &y == (x + 1) would make *&y invoke undefined behavior, which I would hope you agree would be stupid.

The webpage cited in Comment 1 is about C, not C++.

Fair enough - I Could totally believe this is defined in C. We might already have a bug to dup against for that case. I'm pretty sure this code does not invoke undefined behavior in both C and C++ (although it is unspecified whether it returns 0 or 2, but that does not give license for the transformation as it can give other results).