coderforlife / ms-compress

Open source implementations of Microsoft compression algorithms
205 stars 46 forks source link

GCC tree-vectorization problem #4

Closed coderforlife closed 9 years ago

coderforlife commented 9 years ago

See if newer versions of GCC still cause "access violation reading 0x00000000" in Xpress-Huffman when the tree-vectorize optimization is enabled.

nemequ commented 9 years ago

-ftree-vectorize works fine here (gcc version 4.9.2 20141101 (Red Hat 4.9.2-1) on F21 x86_64).

coderforlife commented 9 years ago

I am not getting this with MingW-W64's gcc v4.8.2 rev0 on W7 32-bit anymore either. I never figured out why it happened in the first place, but apparently it is no longer and issue.

coderforlife commented 9 years ago

Actually, still seems to be an issue with g++ (i686-posix-dwarf-rev1, Built by MinGW-W64 project) 4.9.2 on Windows 7 x64 (being called from a 32-bit Python script).

With the latest code, the error is with address 0xFFFFFFFF instead of 0x00000000.

It seems that files <64kb (the size of an xpress-huffman chunk) work fine while larger ones fail.

coderforlife commented 9 years ago

I have narrowed down the problem to the function xpress_huff_compress (and not its children). I have disabled tree vectorization for just that function instead of globally like before.

There is only two vectorizable loops in that function (line 255 and 283), and the loops are identical. One of them only runs on the last chunk (<64 kb) and the other one runs on other full chunks. This might hint that it being optimized differently in those cases.

For reference, the loop is:

for (uint_fast16_t i = 0, i2 = 0; i < HALF_SYMBOLS; ++i, i2+=2) { out[i] = (lens[i2+1] << 4) | lens[i2]; }

It packs all the length values (<=0xF) into nibbles.

nemequ commented 9 years ago

FWIW, I think -fsanitize=undefined would have caught that. I've seen warnings from other projects about unaligned access from ubsan. Sorry I didn't think of it until it was too late, but I still thought you might want to know for future reference. Tracking that down must have been painful :(

coderforlife commented 9 years ago

Unsure about that, it is defined behavior, when doing an unaligned access with SSE instructions - a trap is raised in the CPU.

In any case, most of the sanitizes don't work in MinGW. I don't know about that one specifically, but I tried a bunch of them and the compiler just said no.