Closed ghost closed 8 years ago
I am not sure I understand. bitArrWrite
is bit-indexed and we are representing bit-arrays in packed form, that is scramble state is really a bit array of 7 bits, represented as a single byte (with the high-bit unused)
I think I misunderstood. I haven't looked exactly at the bit-shifts yet, which I plan to do soon. I have a simpler example which demonstrates this. Attaching required files to reproduce this bug. If this bug does occur, then packIdx/unpackIdx mismatch
happens (only under clang with optimizations on).
I can force the correct behaviour by forcing clang to trace the bits for printing, twice. At an initial glance, I suspect pointer aliasing to be causing some undefined behaviour. But it might just be a clang bug.
Source code:
#include <cstdio>
#include <cstdlib>
#include <stdint.h>
typedef unsigned char Bit ;
typedef unsigned char * BitArrPtr ;
/* Effectively implements: tgt[vstart...(vstart+vlen-1)] := src
* Precondition: tgt has allocated (vlen+7)/8 bytes already
**************************************************************************************/
void bitArrWrite(BitArrPtr src, unsigned int vstart, unsigned int vlen, BitArrPtr tgt)
{
// copy from bit.c
}
void print_bits(uint16_t x)
{
for(int i=8*sizeof(x)-1; i>=0; i--) {
(x & (1 << i)) ? putchar('1') : putchar('0');
}
printf("\n");
}
void print(unsigned char * state) {
printf("Bits; ");
print_bits(state[0]);
// printing each bit twice apparently makes it magically work
#ifdef DOUBLE_PRINT
printf("Double print; ");
print_bits(state[0]);
#endif
}
int main() {
for (unsigned long _lidx = 0; _lidx < 3; _lidx++) {
unsigned char scrmbl_st[1] = {0};
uint8_t orig_id = (uint8_t) _lidx;
bitArrWrite((BitArrPtr) &orig_id, 0, 7, (BitArrPtr) scrmbl_st);
orig_id = orig_id >> 7;
print(scrmbl_st);
}
}
Example output:
$ clang++ bitarrwrite.cpp -w -O3 -o bitarrwrite && ./bitarrwrite
Bits; 0000000000000000
Bits; 0000000000000000
Bits; 0000000000000000
$ clang++ bitarrwrite.cpp -w -D DOUBLE_PRINT -O3 -o bitarrwrite && ./bitarrwrite
Bits; 0000000000000000
Double print; 0000000000000000
Bits; 0000000000000001
Double print; 0000000000000001
Bits; 0000000000000010
Double print; 0000000000000010
$ clang++ bitarrwrite.cpp -w -O0 -o bitarrwrite && ./bitarrwrite
Bits; 0000000000000000
Bits; 0000000000000001
Bits; 0000000000000010
$ clang++ bitarrwrite.cpp -w -D DOUBLE_PRINT -O0 -o bitarrwrite && ./bitarrwrite
Bits; 0000000000000000
Double print; 0000000000000000
Bits; 0000000000000001
Double print; 0000000000000001
Bits; 0000000000000010
Double print; 0000000000000010
$ # I'm using nix-shell -p clang
$ clang -v
clang version 3.6.2 (tags/RELEASE_362/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
Found candidate GCC installation: /nix/store/0pw39ww2z0hmr1y0idw4hqn0jflg908n-gcc-4.9.3/lib/gcc/x86_64-unknown-linux-gnu/4.9.3
Found candidate GCC installation: /nix/store/0pw39ww2z0hmr1y0idw4hqn0jflg908n-gcc-4.9.3/lib64/gcc/x86_64-unknown-linux-gnu/4.9.3
Selected GCC installation: /nix/store/0pw39ww2z0hmr1y0idw4hqn0jflg908n-gcc-4.9.3/lib64/gcc/x86_64-unknown-linux-gnu/4.9.3
Candidate multilib: .;@m64
Selected multilib: .;@m64
@dimitriv I think the precondition tgt has allocated (vlen+7)/8 bytes already
might not be covering all cases. Maybe allowing an extra byte, by modifying it to tgt has allocated ((vlen+7)/8 + 1) bytes already
would work?
Sid and I just looked at this, and bitArrWrite
seems to be the culprit. The problem is that it assumes it can cast a BitArrPtr
to a short pointer even when a bit array has only 1 byte of storage. A quick fix is to always allocate an even number of bytes for bit arrays.
Small note...we should be using types with explicit widths, e.g., uint16
instead of unsigned short
.
Results of adding runtime checks with the clang sanitizer:
bit.c:128:16: runtime error: load of misaligned address 0x7ffcef5c8b6d for type 'unsigned short', which requires 2 byte alignment
0x7ffcef5c8b6d: note: pointer points here
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0 0c fb bd a4 7f 00 00 d7
^
bit.c:128:4: runtime error: store to misaligned address 0x7ffcef5c8b6d for type 'unsigned short', which requires 2 byte alignment
0x7ffcef5c8b6d: note: pointer points here
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0 0c fb bd a4 7f 00 00 d7
^
bit.c:129:8: runtime error: load of misaligned address 0x7ffcef5c8b4f for type 'unsigned short', which requires 2 byte alignment
0x7ffcef5c8b4f: note: pointer points here
00 a4 7f 00 00 01 00 00 00 00 00 00 00 50 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
bit.c:89:15: runtime error: unsigned integer overflow: 0 - 1 cannot be represented in type 'unsigned int'
Yep does look serious ..
I've pushed my clang changes to master, which should help catching these types of minor quirks easier. There's a lot of stuff going on, such as renames and what not, but basically it just adds two compilers, clang
and safe
.
safe
is theoretically (supposed to be) a compiler for an abstract machine based off of the C/C++ abstract machine; the only difference being that it includes runtime checks and crashes on noticing undefined behavior. Of course this is not perfect. Right now, it just uses clang with additional runtime checks. Hopefully there will be a better alternative in future. Also, given how hard it is to accurately detect undefined behavior, safe
doesn't need to pass all the tests. It should however be a useful tool for debugging.
For reproducing this issue, you should be able to use:
$ # For actual performance tests
$ COMPILER=clang make testRX6Mbps.perf
$
$ # For debugging purposes
$ COMPILER=safe make testRX6Mbps.perf
I wouldn't recommend throwing print statements around (or using a step-through debugger) to debug clang since the clang frontend is doing some sort of lazy evaluation (probably by changing the scope of automatic variables) in the optimizer. I've noticed debugging clang directly does cause behavior to change.
The last time I checked, clang passed all tests, and does surprisingly good on the RXCCA performance test, but faces LUT bugs in the other performance tests.
Running
make testRX6Mbps.perf
in the performance tests with--autolut
yields the following C code:This test does pass, but there's undefined behaviour here. Picking out the relevant details:
bitArrWrite
is trying to access memory from 0-7 whilescrmbl_st583
can only be accessed 0-1. If the compiler chooses to optimize memory blocks 2-7, the behaviour would be undefined, most likely causingpackIdx/unpackIdx mismatch
.@mainland This is the clang bug.