Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Wrong optimizations and missed optimizations involving pointers to subobjects #37081

Open Quuxplusone opened 6 years ago

Quuxplusone commented 6 years ago
Bugzilla Link PR38108
Status NEW
Importance P normal
Reported by Pascal Cuoq (cuoq@trust-in-soft.com)
Reported on 2018-07-09 09:39:10 -0700
Last modified on 2018-07-12 02:07:07 -0700
Version 6.0
Hardware PC Linux
CC llvm-bugs@lists.llvm.org, nunoplopes@sapo.pt, richard-llvm@metafoo.co.uk
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
Please consider the functions below and the code generated by Clang 6.0.0 and
trunk according to Matt Godbolt's Compiler Explorer
(https://godbolt.org/g/Z4auAE ):

____
// C code
#include <string.h>
#include <stddef.h>

struct s {
    int a;
    char t[8];
    int b;
} s;

int r1, r2, r3, r4;
void f(char *);

void std_fun(int x, int z) {
  char *p = s.t;
  r1 = s.a;
  r2 = s.b;
  memset(p+x, 0, z);
  r3 = s.a;
  r4 = s.b;
}

void std_fun_twist(int x, int z) {
  char *p = (char*)&s + offsetof(struct s, t);
  r1 = s.a;
  r2 = s.b;
  memset(p+x, 0, z);
  r3 = s.a;
  r4 = s.b;
}

void custom_fun(void) {
  struct s s;
  char *p = s.t;
  r1 = s.a;
  r2 = s.b;
  f(p);
  r3 = s.a;
  r4 = s.b;
}

void signed_ptr_offset(int x) {
  char *p = s.t;
  r1 = s.a;
  r2 = s.b;
  p[x]=1;
  r3 = s.a;
  r4 = s.b;
}

void signed_ptr_offset_twist(int x) {
  char *p = (char*)&s + offsetof(struct s, t);
  r1 = s.a;
  r2 = s.b;
  p[x]=1;
  r3 = s.a;
  r4 = s.b;
}

void array(int x) {
  r1 = s.a;
  r2 = s.b;
  s.t[x]=1;
  r3 = s.a;
  r4 = s.b;
}
_______
Generated Assembly (Clang 6.0.0):

std_fun:                                # @std_fun
        pushq   %rbx
        movl    s(%rip), %ebx
        movl    %ebx, r1(%rip)
        movl    s+12(%rip), %eax
        movl    %eax, r2(%rip)
        movslq  %edi, %rax
        leaq    s+4(%rax), %rdi
        movslq  %esi, %rdx
        xorl    %esi, %esi
        callq   memset
        movl    %ebx, r3(%rip)
        movl    s+12(%rip), %eax
        movl    %eax, r4(%rip)
        popq    %rbx
        retq
std_fun_twist:                          # @std_fun_twist
        pushq   %rbx
        movl    s(%rip), %ebx
        movl    %ebx, r1(%rip)
        movl    s+12(%rip), %eax
        movl    %eax, r2(%rip)
        movslq  %edi, %rax
        leaq    s+4(%rax), %rdi
        movslq  %esi, %rdx
        xorl    %esi, %esi
        callq   memset
        movl    %ebx, r3(%rip)
        movl    s+12(%rip), %eax
        movl    %eax, r4(%rip)
        popq    %rbx
        retq
custom_fun:                             # @custom_fun
        subq    $24, %rsp
        leaq    12(%rsp), %rdi
        callq   f
        movl    8(%rsp), %eax
        movl    20(%rsp), %ecx
        movl    %eax, r3(%rip)
        movl    %ecx, r4(%rip)
        addq    $24, %rsp
        retq
signed_ptr_offset:                      # @signed_ptr_offset
        movl    s(%rip), %eax
        movl    %eax, r1(%rip)
        movl    s+12(%rip), %ecx
        movslq  %edi, %rdx
        movb    $1, s+4(%rdx)
        movl    %ecx, r2(%rip)
        movl    %eax, r3(%rip)
        movl    s+12(%rip), %eax
        movl    %eax, r4(%rip)
        retq
signed_ptr_offset_twist:                # @signed_ptr_offset_twist
        movl    s(%rip), %eax
        movl    %eax, r1(%rip)
        movl    s+12(%rip), %ecx
        movslq  %edi, %rdx
        movb    $1, s+4(%rdx)
        movl    %ecx, r2(%rip)
        movl    %eax, r3(%rip)
        movl    s+12(%rip), %eax
        movl    %eax, r4(%rip)
        retq
array:                                  # @array
        movl    s(%rip), %eax
        movl    %eax, r1(%rip)
        movl    s+12(%rip), %ecx
        movslq  %edi, %rdx
        movb    $1, s+4(%rdx)
        movl    %ecx, r2(%rip)
        movl    %eax, r3(%rip)
        movl    s+12(%rip), %eax
        movl    %eax, r4(%rip)
        retq
___________

The point of interest for each function is whether Clang generates code to
reload the value of the struct members a and b after the struct has been
accessed, the access being made from a pointer initially pointing to member t.
The values are to be stored in variables r3 and r4. A sequence such as the
following indicates that Clang considers that the value of the member a must be
the same after the call to memset as it was before:

        callq   memset
        movl    %ebx, r3(%rip)

Note that the variable p is always a pointer to char, and s is always a
variable, which means that the functions cannot be said to be incorrect because
of “strict aliasing” violations. Incidentally, adding "-fno-strict-aliasing" to
the commandline does not change the generated code.

1/ WRONG OPTIMIZATION

In the functions std_fun_twist and signed_ptr_offset_twist, r3 is assigned
without reloading the member a. I do not think that there exist any possible
justification for this optimization in a real compiler used for, for instance,
OS code. Some programmers will use char pointers to access the representation
of structs. The struct's representation is an array of bytes
(https://port70.net/~nsz/c/c11/n1570.html#6.2.6.1p2 ) and these functions are
only doing valid pointer arithmetics within this array.

2/ MISSED OPTIMIZATION

In the function named array, Clang reloads the member b to assign to variable
r4. I think it would be valid to treat any argument x to that function not
between 0 and 7 as causing undefined behavior, and therefore to reuse the value
loaded in %ecx. This is an optimization that GCC already does.

3/ BORDERLINE CASES

While it is difficult to infer intention from absence of optimization, I think
that GCC avoids optimizing the other functions on purpose, making a distinction
between direct array subscripting and use of an intermediate pointer that can
be interpreter as a pointer inside the entire struct. While the C standards do
not distinguish between the two, I think this is a valuable heuristic and I
hope that Clang will align to GCC's behavior in these cases. Regardless of the
decision, each example is either a wrong optimization or a missed optimization,
since Clang always reload the member b and assumes that the member a was not
modified.
Quuxplusone commented 6 years ago
> 1/ WRONG OPTIMIZATION
>
> In the function[...] std_fun_twist [...], r3 is assigned without reloading
> the member a.

I believe LLVM's behavior on that function is correct. memset from offset 4
onwards cannot affect the bytes at offset 0-3, so no reload of s.a is necessary.

> In the function[...] signed_ptr_offset_twist, r3 is assigned without
> reloading the member a.

This looks like a bug. I believe the problem is that Clang emits the array
indexing as a "getelementptr inbounds" (indicating that the unsigned arithmetic
does not wrap around the address space), which is wrong in the case where the
integer operand is signed and the other operand is not the result of immediate
array-to-pointer decay.

> 2/ MISSED OPTIMIZATION
>
> In the function named array, Clang reloads the member b

Yes, this is a missed opportunity. Clang already performs this optimization in
some cases, but does not take array accesses into account when forming the
struct-path-based TBAA set for the access to p[x].

> 3/ BORDERLINE CASES
>
> While it is difficult to infer intention from absence of optimization,
> I think that GCC avoids optimizing the other functions on purpose

I wouldn't assume that; it seems more likely to me to be due to implementation
difficulty. That said, Clang's unlikely to optimize 'signed_ptr_offset' any
time soon, because our determination of the aliasing set for an expression is
based on a local syntactic analysis performed by the frontend; we would need to
reflect significant chunks of the frontend semantics into the optimizer in
order to be able to infer that 'p' is guaranteed to point to a 'member t in
struct s' object. GCC may well not be optimizing it for largely the same reason.
Quuxplusone commented 6 years ago
(In reply to Richard Smith from comment #1)
> > 1/ WRONG OPTIMIZATION
> >
> > In the function[...] std_fun_twist [...], r3 is assigned without reloading
> > the member a.
>
> I believe LLVM's behavior on that function is correct. memset from offset 4
> onwards cannot affect the bytes at offset 0-3, so no reload of s.a is
> necessary.

Sorry, I missed the '+x' here. The problem appears to be the same as in the
other function (incorrect use of 'getelementptr inbounds').
Quuxplusone commented 6 years ago

Sanjoy says that 'inbounds' is not supposed to imply that the pointer value cannot decrease, so this looks like a middle-end optimization bug.

Quuxplusone commented 6 years ago

Actually... looks like this has already been fixed; with "Clang (trunk)" on godbolt, all 6 functions reload s.a and s.b.

Quuxplusone commented 6 years ago

I agree that the “wrong optimization” part of the report is already fixed in the version that Compiler Explorer calls “trunk” today (“clang version 7.0.0 (trunk 336621)” for long).

I would also like to thank you for your insights on the borderline cases.

Quuxplusone commented 6 years ago
Wait, what about `custom_fun`?
The stores to r1, r2 get removed with -O2, but not with -O1. This is wrong,
since we don't know anything about `f`.
Quuxplusone commented 6 years ago
(In reply to Nuno Lopes from comment #6)
> Wait, what about `custom_fun`?
> The stores to r1, r2 get removed with -O2, but not with -O1. This is wrong,
> since we don't know anything about `f`.

That's probably because they're storing undef.  I see the stores if I add
initializers for a and b.
Quuxplusone commented 6 years ago
(In reply to Sanjoy Das from comment #7)
> (In reply to Nuno Lopes from comment #6)
> > Wait, what about `custom_fun`?
> > The stores to r1, r2 get removed with -O2, but not with -O1. This is wrong,
> > since we don't know anything about `f`.
>
> That's probably because they're storing undef.  I see the stores if I add
> initializers for a and b.

The confusing part here is that there is both a global and a local (in
custom_fun) named 's'.
Quuxplusone commented 6 years ago

Yes, I am sorry, I accidentally put irrelevant undefined behavior in the example custom_fun. I mean for the automatic variable s to be initialized. I made it an automatic variable in that example because otherwise the compiler had to assume that this variable might be known to, and modified directly by, the called function f, so the example would not test what I intended to test.

Quuxplusone commented 6 years ago

OK, I believe there are a couple of remaining pieces here, then:

1) Make sure that the fix for the miscompiles has been backported to the Clang 6 branch; this seems like an important miscompile to fix.

2) Extend Clang's field-sensitive TBAA metadata to also encode information about arrays, so that we can optimize 'array'. (As discussed in "borderline cases", this would not affect any of the other functions in comment#0.)

Quuxplusone commented 6 years ago

For reference, Michele Alberti has just pointed me to this discussion in GCC's bugzilla, which contains the same kind of discussion as here:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86259

To attempt to summarize that discussion: