Open Quuxplusone opened 6 years ago
Bugzilla Link | PR38108 |
Status | NEW |
Importance | P normal |
Reported by | Pascal Cuoq (cuoq@trust-in-soft.com) |
Reported on | 2018-07-09 09:39:10 -0700 |
Last modified on | 2018-07-12 02:07:07 -0700 |
Version | 6.0 |
Hardware | PC Linux |
CC | llvm-bugs@lists.llvm.org, nunoplopes@sapo.pt, richard-llvm@metafoo.co.uk |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
> 1/ WRONG OPTIMIZATION
>
> In the function[...] std_fun_twist [...], r3 is assigned without reloading
> the member a.
I believe LLVM's behavior on that function is correct. memset from offset 4
onwards cannot affect the bytes at offset 0-3, so no reload of s.a is necessary.
> In the function[...] signed_ptr_offset_twist, r3 is assigned without
> reloading the member a.
This looks like a bug. I believe the problem is that Clang emits the array
indexing as a "getelementptr inbounds" (indicating that the unsigned arithmetic
does not wrap around the address space), which is wrong in the case where the
integer operand is signed and the other operand is not the result of immediate
array-to-pointer decay.
> 2/ MISSED OPTIMIZATION
>
> In the function named array, Clang reloads the member b
Yes, this is a missed opportunity. Clang already performs this optimization in
some cases, but does not take array accesses into account when forming the
struct-path-based TBAA set for the access to p[x].
> 3/ BORDERLINE CASES
>
> While it is difficult to infer intention from absence of optimization,
> I think that GCC avoids optimizing the other functions on purpose
I wouldn't assume that; it seems more likely to me to be due to implementation
difficulty. That said, Clang's unlikely to optimize 'signed_ptr_offset' any
time soon, because our determination of the aliasing set for an expression is
based on a local syntactic analysis performed by the frontend; we would need to
reflect significant chunks of the frontend semantics into the optimizer in
order to be able to infer that 'p' is guaranteed to point to a 'member t in
struct s' object. GCC may well not be optimizing it for largely the same reason.
(In reply to Richard Smith from comment #1)
> > 1/ WRONG OPTIMIZATION
> >
> > In the function[...] std_fun_twist [...], r3 is assigned without reloading
> > the member a.
>
> I believe LLVM's behavior on that function is correct. memset from offset 4
> onwards cannot affect the bytes at offset 0-3, so no reload of s.a is
> necessary.
Sorry, I missed the '+x' here. The problem appears to be the same as in the
other function (incorrect use of 'getelementptr inbounds').
Sanjoy says that 'inbounds' is not supposed to imply that the pointer value cannot decrease, so this looks like a middle-end optimization bug.
Actually... looks like this has already been fixed; with "Clang (trunk)" on godbolt, all 6 functions reload s.a and s.b.
I agree that the “wrong optimization” part of the report is already fixed in the version that Compiler Explorer calls “trunk” today (“clang version 7.0.0 (trunk 336621)” for long).
I would also like to thank you for your insights on the borderline cases.
Wait, what about `custom_fun`?
The stores to r1, r2 get removed with -O2, but not with -O1. This is wrong,
since we don't know anything about `f`.
(In reply to Nuno Lopes from comment #6)
> Wait, what about `custom_fun`?
> The stores to r1, r2 get removed with -O2, but not with -O1. This is wrong,
> since we don't know anything about `f`.
That's probably because they're storing undef. I see the stores if I add
initializers for a and b.
(In reply to Sanjoy Das from comment #7)
> (In reply to Nuno Lopes from comment #6)
> > Wait, what about `custom_fun`?
> > The stores to r1, r2 get removed with -O2, but not with -O1. This is wrong,
> > since we don't know anything about `f`.
>
> That's probably because they're storing undef. I see the stores if I add
> initializers for a and b.
The confusing part here is that there is both a global and a local (in
custom_fun) named 's'.
Yes, I am sorry, I accidentally put irrelevant undefined behavior in the example custom_fun. I mean for the automatic variable s to be initialized. I made it an automatic variable in that example because otherwise the compiler had to assume that this variable might be known to, and modified directly by, the called function f, so the example would not test what I intended to test.
OK, I believe there are a couple of remaining pieces here, then:
1) Make sure that the fix for the miscompiles has been backported to the Clang 6 branch; this seems like an important miscompile to fix.
2) Extend Clang's field-sensitive TBAA metadata to also encode information about arrays, so that we can optimize 'array'. (As discussed in "borderline cases", this would not affect any of the other functions in comment#0.)
For reference, Michele Alberti has just pointed me to this discussion in GCC's bugzilla, which contains the same kind of discussion as here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86259
To attempt to summarize that discussion:
there is consensus that the standard says that one cannot use a pointer to a subobject to go back to the containing object;
the idea of fully exploiting this idea is both on the table and making at least some GCC developers uncomfortable;
GCC already contains exceptions where this idea is deliberately not exploited in order to keep legacy code working (memcpy is mentioned);
it is already possible to trigger optimizations that would surprise a programmer who believes that one can go back form subobject to containing object: one example is the subject of the bug report.