facebook / zstd

Zstandard - Fast real-time compression algorithm
http://www.zstd.net
Other
23.38k stars 2.08k forks source link

Recursive inline code failing to build with WindRiver 6.9 gnu compiler 4.3.3 #3194

Closed godfreymark closed 1 year ago

godfreymark commented 2 years ago

When I compile the ZSTD code under a VxWorks 6.9 environment, using the gnu compiler version 4.3.3 I see the following error reported when it attempts to compile the zstd_lazy.c file.

ZSTD/compress/zstd_lazy.c: In function 'ZSTD_RowFindBestMatch_dedicatedDictSearch_6_6':
ZSTD/compress/zstd_lazy.c:988: sorry, unimplemented: inlining failed in call to 'ZSTD_row_getSSEMask': recursive inlining
ZSTD/compress/zstd_lazy.c:1018: sorry, unimplemented: called from here
C:\WindRiver\WindRiver_6_9\utilities-1.0\x86-win32\bin\make.exe: *** [build/zstd_lazy.o] Error 1
Exiting.

I was previously compiling the exact same zstd source code under VxWorks 6.8 environment, using the gnu compiler version 4.1.2 without any issues. So on the face of it, it looks to me like the compiler's handling of the inline code has changed between 4.1.2 and 4.3.3

As a quick/easy work-around I have modfied the actual code in zstd_lazy.c to prevent this function getting "inlined" as follows:

/* FORCE_INLINE_TEMPLATE */ ZSTD_VecMask     //      <-----   Comment out the inline qualifier
ZSTD_row_getSSEMask(int nbChunks, const BYTE* const src, const BYTE tag, const U32 head)
{
    const __m128i comparisonMask = _mm_set1_epi8((char)tag);
    int matches[4] = {0};
    int i;
    assert(nbChunks == 1 || nbChunks == 2 || nbChunks == 4);
    for (i=0; i<nbChunks; i++) {
        const __m128i chunk = _mm_loadu_si128((const __m128i*)(const void*)(src + 16*i));
        const __m128i equalMask = _mm_cmpeq_epi8(chunk, comparisonMask);
        matches[i] = _mm_movemask_epi8(equalMask);
    }
    if (nbChunks == 1) return ZSTD_rotateRight_U16((U16)matches[0], head);
    if (nbChunks == 2) return ZSTD_rotateRight_U32((U32)matches[1] << 16 | (U32)matches[0], head);
    assert(nbChunks == 4);
    return ZSTD_rotateRight_U64((U64)matches[3] << 48 | (U64)matches[2] << 32 | (U64)matches[1] << 16 | (U64)matches[0], head);
}

Would like to ask if there is some bug in the ZSTD code itself - related to how it defines the inline qualifier? Or could it be that it needs a particular optimisation flag (or set of flags) specified on the command line to gcc during compilation? I am currently using flag -O2

terrelln commented 1 year ago

This seems like a compiler issue. gcc 4.3.3 is really old, so it isn't terribly surprising that the compiler is buggy.

It looks like you have a workaround, so I am closing the issue for now. If we see other reports of this issue, then we can re-evaluate working around it upstream.