TrustInSoft / tis-interpreter

An interpreter for finding subtle bugs in programs written in standard C
565 stars 28 forks source link

when a union has an initializer, do parts of the union that only exist in variants other than the specified one have defined contents? #108

Closed comex closed 8 years ago

comex commented 8 years ago

tis-interpreter doesn't think so:

% cat /tmp/onion.c
int main() {
  union { int a; struct { int b; int c; }; } u = {1};
  return u.c == 0;
}
% ./tis-interpreter.sh /tmp/onion.c
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
/tmp/onion.c:3:[kernel] warning: accessing uninitialized left-value:
                  assert \initialized(&u.__anonCompField1.c);
                  stack: main
[value] Stopping at nth alarm

I suppose this paragraph suggests not: (C11 6.2.6.1.6)

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

But then: (C11 6.7.9.19)

The initialization shall occur in initializer list order, each initializer provided for a particular subobject overriding any previously listed initializer for the same subobject; all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration.

Doesn't c in my example count as a subobject that should be initialized implicitly?

(I ran into this warning with some code of mine that was buggy and wasn't supposed to be doing anything of the sort - the union members were supposed to be the same size. tis-interpreter found the bug, but I'm not sure it actually should have...)

pascal-cuoq commented 8 years ago

Sorry, hasty conclusion in the deleted comment. Why can't compilers optimize when one wants them too?

On architectures where NULL or 0.0 are not represented with all bits zero, it can be impossible to initialize both the second and the third member of the struct, even in the area where they both go beyond the width of the first member. Consider, on such a weird architecture:

{
  union { short a; float f[100]; int i[100]; void *p[100]; } u = {0};
}

Which member should win the privilege of being initialized? The standard says a. It cannot mean that f, i, p are initialized because they cannot all be initialized.

On the examples I tried at gcc.godbolt.org, it seems that both GCC and Clang set the entire union to zero (and that happens to initialize all members of the union since the representations of NULL and 0.0 coincide on the architectures they target). I'm not sure this is something you want to rely on.

There is a good argument to be made that the bits after a should be set to 0 as padding in C11. However the words that can be interpreted this way were one of a few discrete changes from C99 to C11 (C99 does not mention setting padding bits to 0), so you'd still want to treat this padding as uninitialized in case the code ever gets compiled with a C99 compiler.

In fact the sentence “if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;” in 6.7.9:10 strongly suggest that the bits of f, i and p that go beyond the width of a are padding. What padding would we be talking about otherwise?

pascal-cuoq commented 8 years ago

Also examples by Alexander Cherepanov show that padding is not very stable. It's one of the many little inconsistencies in the C standard(s): the committee added explicit words in C11 to say what happens to padding at initialization, with the rationale that memcmp will work better on structs with padding, but other words in the standard or in DRs can already be interpreted as saying that padding bits change freely without any action of the program. So what use is the new guarantee in C11 supposed to be?

Ref: https://github.com/TrustInSoft/tis-interpreter/issues/101#issuecomment-223332754 https://twitter.com/ch3root/status/742358182891257856

ch3root commented 8 years ago

I have just played with initializers for unions several days ago (as related to the question "What is the value of a union?") and it seems that this area is underspecified in C11. If c is considered a subobject then it should be possible to initialize it (by using a designator) at the same time as initializing a, right? The example below shows that gcc and clang don't permit it. Not sure what is the right approach. In tis-interpreter it seem safer to assume that anything not explicitly initialized is indeterminate.

Somewhat related:

Source code:

#include <stdio.h>

int main() {
  union { int a; struct { int b; int c; }; } u = {.c = 2, .a = 1};
  printf("%d\n", u.c);
}

tis-interpreter (31be1ffdb350ea940095be4757d0d5779c38f10b) output:

test.c:4:[kernel] failure: Cannot find designated field c
[kernel] user error: stopping on file "test.c" that has errors. Add '-kernel-msg-key pp'
                     for preprocessing command.
[kernel] Frama-C aborted: invalid user input.

gcc (GCC) 7.0.0 20160616 (experimental):

$ gcc -std=c11 -pedantic -Wall -Wextra -O3 -fsanitize=undefined test.c && ./a.out
test.c: In function ‘main’:
test.c:4:64: warning: initialized field overwritten [-Woverride-init]
   union { int a; struct { int b; int c; }; } u = {.c = 2, .a = 1};
                                                                ^
test.c:4:64: note: (near initialization for ‘u.a’)
0

clang version 3.9.0 (trunk 271312):

$ clang -std=c11 -Weverything -O3 -fsanitize=undefined test.c && ./a.out
test.c:4:60: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
  union { int a; struct { int b; int c; }; } u = {.c = 2, .a = 1};
                                                          ~^
test.c:4:51: note: previous initialization is here
  union { int a; struct { int b; int c; }; } u = {.c = 2, .a = 1};
                                                  ^~~~~~
1 warning generated.
0