AbsInt / CompCert

The CompCert formally-verified C compiler
https://compcert.org
Other
1.85k stars 225 forks source link

Bit-field initialization for automatic-storage variables #22

Open ghost opened 9 years ago

ghost commented 9 years ago

Bit-field initialization via -fbitfields emulation leads to undefined behavior in most cases. It would be nice to be able to do it without having to resort to a static-storage variable.

For instance, in this program:

typedef struct { int f: 8; } t;
int main() {
    t p = {0};
    return p.f;
}

The compiled code executes as expected, but the emulated bit-fields generate reads from uninitialized variables. Here's an excerpt of the .compcert.c code via -dc:

int main(void)
{
  struct _108 p;
  p.__bf1 = p.__bf1 & ~255U | (unsigned int) 1 << 0U & 255U;
  return (int) (p.__bf1 << 24) >> 24;
}

The only way I was able to initialize it was by introducing a static-storage struct and copying its value:

typedef struct { int f: 8; } t;
t t_zero;
int main() {
    t p = t_zero;
    return p.f;
}

On a side note, the CompCert website (http://compcert.inria.fr/compcert-C.html) mentions that designated initializers are not supported, but they seem to work, at least in expressions such as t p = {.g = 1, .f = 2} and int a[10] = {[4] = 1, [2] = 3}.

xavierleroy commented 9 years ago

Concerning designated initializers, you're right that they were added in the latest release and that the Web site needs updating.

Concerning bitfields, the issue you describe is known but regrettable indeed. The CompCert semantics have no notion of "partially defined" integers, where some bits are known and other bits are undefined. This notion would avoid the undefined behavior reported on your example, as only well-defined bits would contribute to the return value. But supporting partially defined integers would be a huge change throughout CompCert.

An alternate solution is for the bitfield emulation to insert explicit initialization of "auto" variables that contain bit fields. For "auto" variables of type "struct", there is a small performance hit, although later back-end passes can eliminate the initialization later in many cases. Plus, only the "carrier" fields that pack bit fields need initialization, not regular, non-bit fields. Where it gets ugly is if we have an array of structs containing bit-fields. Then, we need to generate either a loop or a call to a "bzero"-like function, none of which could be optimized later.

ghost commented 9 years ago

I see, the impact would indeed be much larger than I expected.

One thing I wondered, when reading the result of the interpreter, was whether expressions such as <undef> & 0 or <undef> | -1 could actually evaluate to 0 and -1, respectively. This would enable for instance having constant-like conditions and, in this case, it seems like it might help dealing with "unimportant" undefined values. But I have no long-term visibility about its impact. I imagine it would lead to a semantics that would be too lax, or lead to too many cases to be considered.

xavierleroy commented 9 years ago

Commits [b04bb78] and [916bfc0] implement a partial fix to this problem: initializers for structs containing bitfields are now translated in a way that avoids undefined behavior. (By avoiding to refer to the initial values of the carrier fields.) This includes both block-local initialization

    { t p = {.f = 0}; return p.f }

and compound initializers

    (t) {.f = 0}

The problem remains for declaration followed by initialization of individual fields, as in

    { t p; p.f = 0; return p.f; }

Since the problem is less acute now, I reclassify this report as enhancement rather than bug.