Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Incorrect transformation with -fstrict-aliasing #44577

Open Quuxplusone opened 4 years ago

Quuxplusone commented 4 years ago
Bugzilla Link PR45607
Status NEW
Importance P normal
Reported by Pascal Cuoq (cuoq@trust-in-soft.com)
Reported on 2020-04-19 07:07:52 -0700
Last modified on 2020-04-21 10:55:32 -0700
Version 10.0
Hardware PC Linux
CC alexey.salmin@gmail.com, blitzrakete@gmail.com, dgregor@apple.com, efriedma@quicinc.com, erik.pilkington@gmail.com, florian_hahn@apple.com, lebedev.ri@gmail.com, llvm-bugs@lists.llvm.org, richard-llvm@metafoo.co.uk, sbansal@iitd.ac.in, yuanfang.chen@sony.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
C17 6.5:6 says:

The effective type of an object for an access to its stored value is the
declared type of the object, if any.88) If a value is stored into an object
having no declared type through an lvalue having a type that is not a character
type, then the type of the lvalue becomes the effective type of the object for
that access and for subsequent accesses that do not modify the stored value. If
a value is copied into an object having no declared type using memcpy or
memmove, or is copied as an array of character type, then the effective type of
the modified object for that access and for subsequent accesses that do not
modify the value is the effective type of the object from which the value is
copied, if it has one. For all other accesses to an object having no declared
type, the effective type of the object is simply the type of the lvalue used
for the access.

I interpret the words “If a value is stored into an object having no declared
type through an lvalue having a type that is not a character type, then the
type of the lvalue becomes the effective type of the object for that access and
for subsequent accesses that do not modify the stored value” as saying that the
repurposing of dynamically allocated memory is allowed in C. A write of a new
effective type to dynamically allocated memory is not a “subsequent access that
does not modify the stored value”, so it simply changes the type of the memory
to the type of the lvalue being used for the write.

Consider the following program, which was provided by Sorav Bansal:

(Compiler Explorer link: https://gcc.godbolt.org/z/Vg25xq )

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>

int a;

__attribute__((__noinline__)) int foo(float *x, int *y, int *z)
{
  int sum = 1;
  for (int i = 0; i < a; i++) {
    *z = 0x123;
    *x = 3.0f;
    sum *= *y;
    *z = 0x40400300;
  }
  return sum;
}

int main()
{
  char *f;
  f = malloc(32);
  if (!f) abort();
  int i = 3;
  a = 1;
  foo((float*)f, &i, (int*)f);
  float fl;
  memcpy(&fl, f, 4);
  printf("f = %f\n", fl);
}

According to my interpretation above (“repurposing of dynamically allocated
memory is allowed”), this program does nothing wrong. The memory location that
both x and z point to is repurposed many times, but it is never read inside
function foo, and main reads it carefully using memcpy.

The behavior I expect for this program is to print:

f = 3.000183

(3.000183 is the float represented by 0x40400300.)

However, when -fstrict-aliasing is enabled, Clang 10 moves the assignment “*x =
3.0f;” after the loop and makes the compiled program print:

f = 3.000000

I think the transformation is wrong here. The function foo does not invoke UB
when passed z and x aliasing pointers to dynamically allocated memory, and the
“naïve memory model” behavior of the function should have been preserved for
such a calling context.

Yes, this would mean that GCC has independently implemented the same bug, but I
am still confident enough that the C standard intended to allow the repurposing
of dynamically allocated memory to report this. Please accept my apologies if I
am wrong.

(Bug reported in GCC at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94658 )
Quuxplusone commented 4 years ago
I'm pretty sure this is UB, and is a classical example of strict aliasing rules
violation,
i.e. not a miscompile. Have you read
https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8 ?
Quuxplusone commented 4 years ago

@Roman

Apparently I must have read it, since I provided feedback on it when it was in draft form. I don't see where it specifically addresses the question at play here.

I have also put in writing the claim that memory reuse is allowed in C, in the article cited as [25] in Shafik's post, so if it turns out that I was wrong all this time, I shall be disappointed that no-one bothered to tell me so.

Important: Note that this bug report is against the C compiler Clang.

Quuxplusone commented 4 years ago

Previously discussed in bug 26603 and bug 21725.

Quuxplusone commented 4 years ago

See http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_236.htm

Note the somewhat disappointing response from WG14: the committee response says this is undefined behavior, but the rationale does not correspond to any wording in the C standard. :(

Quuxplusone commented 4 years ago

What do the C++ rules look like these days? I was trying to decipher [intro.object], but the "implicitly creating objects" rules aren't well-organized.

Quuxplusone commented 4 years ago

In C++, pointers point to specific objects, not just to storage, and objects are only created by explicit source syntax (new-expressions, declarations, temporaries that create objects, assignments that are syntactically performed through union members).

Under the C++ rules, it's valid to reorder even a store past a store if their TBAA information is different and there are no intervening lifetime events on that storage.

Quuxplusone commented 4 years ago
(In reply to Richard Smith from comment #4)
> See http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_236.htm
>
> Note the somewhat disappointing response from WG14: the committee response
> says this is undefined behavior, but the rationale does not correspond to
> any wording in the C standard. :(

Thanks Richard for sharing this response from WG14. Even though they provide a
clear response, they do say that "The current situation requires more
consideration".  I am wondering if any follow-up discussion took place on this,
or was it not considered?  The conclusion of the committee does seem to run
contrary to the textual and formal specifications (e.g., Michael Norrish's
formalization) of C available in the literature.