Closed coderforlife closed 9 years ago
-ftree-vectorize works fine here (gcc version 4.9.2 20141101 (Red Hat 4.9.2-1) on F21 x86_64).
I am not getting this with MingW-W64's gcc v4.8.2 rev0 on W7 32-bit anymore either. I never figured out why it happened in the first place, but apparently it is no longer and issue.
Actually, still seems to be an issue with g++ (i686-posix-dwarf-rev1, Built by MinGW-W64 project) 4.9.2 on Windows 7 x64 (being called from a 32-bit Python script).
With the latest code, the error is with address 0xFFFFFFFF instead of 0x00000000.
It seems that files <64kb (the size of an xpress-huffman chunk) work fine while larger ones fail.
I have narrowed down the problem to the function xpress_huff_compress
(and not its children). I have disabled tree vectorization for just that function instead of globally like before.
There is only two vectorizable loops in that function (line 255 and 283), and the loops are identical. One of them only runs on the last chunk (<64 kb) and the other one runs on other full chunks. This might hint that it being optimized differently in those cases.
For reference, the loop is:
for (uint_fast16_t i = 0, i2 = 0; i < HALF_SYMBOLS; ++i, i2+=2) { out[i] = (lens[i2+1] << 4) | lens[i2]; }
It packs all the length values (<=0xF) into nibbles.
FWIW, I think -fsanitize=undefined would have caught that. I've seen warnings from other projects about unaligned access from ubsan. Sorry I didn't think of it until it was too late, but I still thought you might want to know for future reference. Tracking that down must have been painful :(
Unsure about that, it is defined behavior, when doing an unaligned access with SSE instructions - a trap is raised in the CPU.
In any case, most of the sanitizes don't work in MinGW. I don't know about that one specifically, but I tried a bunch of them and the compiler just said no.
See if newer versions of GCC still cause "access violation reading 0x00000000" in Xpress-Huffman when the tree-vectorize optimization is enabled.