Cyan4973 / xxHash

Extremely fast non-cryptographic hash algorithm
http://www.xxhash.com/
Other
9.18k stars 777 forks source link

Strange `-Wcast-align` warning from `clang-3.9` #559

Closed t-mat closed 3 years ago

t-mat commented 3 years ago

I saw the following strange warning (as an error) from clang-3.9. ( Here's actual GH-Actions log )

clang-3.9 -Werror -Wall -Wextra -Wconversion -Wcast-qual -Wcast-align -Wshadow \
          -Wstrict-aliasing=1 -Wswitch-enum -Wdeclaration-after-statement \
          -Wstrict-prototypes -Wundef -Wpointer-arith -Wformat-security -Wvla \
          -Wformat=2 -Winit-self -Wfloat-equal -Wwrite-strings -Wredundant-decls \
          -Wstrict-overflow=2 -Wno-sign-conversion -DXXHSUM_DISPATCH=1 -c -o xxhash.o xxhash.c

In file included from xxhash.c:43:
./xxhash.h:3868:56: error:
    cast from 'const xxh_u8 *' (aka 'const unsigned char *') to 'const float *'
    increases required alignment from 1 to 4 [-Werror,-Wcast-align]

    XXH_ALIGN(64)        const float* const src  = (float const*) XXH3_kSecret;
                                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~

I'd like to know there's good solution for this issue.

t-mat commented 3 years ago

I made the following minimum reproduction code

#include <stdio.h>
#include <stdalign.h>
#include <stdint.h>

#define XXH_ALIGN(n) __attribute__ ((aligned(n)))
typedef uint8_t xxh_u8;
XXH_ALIGN(64) static const xxh_u8 XXH3_kSecret[192] = { 0 };

int main(int argc, char** argv) {
    XXH_ALIGN(64) const float* const src = (const float* const) XXH3_kSecret;
    printf("alignof(XXH3_kSecret)=%zd\n", alignof(XXH3_kSecret));
    printf("alignof(src)=%zd\n", alignof(src));
}

clang-3.9.1 generates the following warning which I saw in GH-Actions.

clang prog.c -Wall -Wextra -pedantic "-Wcast-align" "-Wno-gnu-alignof-expression" "-Wno-unused-parameter"

prog.c:10:41: warning: cast from 'const xxh_u8 *' (aka 'const unsigned char *') to 'const float *const' increases required alignment from 1 to 4 [-Wcast-align]
        XXH_ALIGN(64) const float* const src = (const float* const) XXH3_kSecret;
                                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.

alignof(XXH3_kSecret)=64
alignof(src)=64

Note that alignof() shows expected and correct value 64.


clang-4.0.0 and later versions compile it without warning though.

clang prog.c -Wall -Wextra -std=c89 "-Wcast-align" "-Wno-gnu-alignof-expression" "-Wno-unused-parameter"

alignof(XXH3_kSecret)=64
alignof(src)=64
t-mat commented 3 years ago

My questions are

(1) Is there correct way to handle this warning?

(2) How can/should we deal with this warning?

Cyan4973 commented 3 years ago

So, the presumption here is a limitation of clang < 4.0, which is unable to take into consideration the extra alignment restriction of XXH3_kSecret. Consequently, it only remembers that it used to be a const xxh_u8 * type, and therefore casting it to const float* does not satisfy the new type alignment restriction.

This will not be easy to satisfy, since the tool we are supposed to employ (manually control alignment) is precisely the one that is not effective for this scenario.

A question comes to mind : Why is the src of type const float* ? Could it be done differently ?

One possibility could be to rewrite XXH3_initCustomSecret_sse2() to not need this cast.

t-mat commented 3 years ago

Why is the src of type const float* ?

Actual prototype of _mm_load_ps() is __m128 _mm_load_ps(const float *p). It requires const float*.

Could it be done differently ?

Perhaps, we can employ union trick which is implemented in XXH3_initCustomSecret_avx512().

https://github.com/Cyan4973/xxHash/blob/4a20afceb625f62278e11f630156475aee40b055/xxhash.h#L3668-L3673

// clang-3.9 prog.c -Wall -Wextra -std=gnu11 "-Wcast-align" "-Wno-gnu-alignof-expression" "-Wno-unused-parameter"
#include <stdio.h>
#include <stdalign.h>
#include <stdint.h>

#define XXH_ALIGN(n) __attribute__ ((aligned(n)))
typedef uint8_t xxh_u8;
XXH_ALIGN(64) static const xxh_u8 XXH3_kSecret[192] = { 0 };

int main(int argc, char** argv) {
    union {
        XXH_ALIGN(64) const float* cfp;
        XXH_ALIGN(64) const uint8_t* cu8p;
    } temp;
    temp.cu8p = XXH3_kSecret;

    XXH_ALIGN(64) const float* src  = temp.cfp;                    // no warning
    XXH_ALIGN(64) const float* src2 = (const float*) XXH3_kSecret; // warning [-Wcast-align]

    printf("alignof(XXH3_kSecret)=%zd\n", alignof(XXH3_kSecret));
    printf("alignof(src)         =%zd\n", alignof(src));
    printf("alignof(src2)        =%zd\n", alignof(src2));
}

result: https://wandbox.org/permlink/LBqz1yP7o600nbjJ

prog.c:17:39: warning: cast from 'const xxh_u8 *' (aka 'const unsigned char *') to 'const float *' increases required alignment from 1 to 4 [-Wcast-align]
    XXH_ALIGN(64) const float* src2 = (const float*) XXH3_kSecret; // warning [-Wcast-align]
                                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
alignof(XXH3_kSecret)=64
alignof(src)         =64
alignof(src2)        =64

We need to benchmark/investigate there's no penalty in this change though.

t-mat commented 3 years ago

I've tested casting via union for SSE2/AVX2/AVX512 with gcc (4.8 and 11.2), clang (3.9 and 12.0.1) and cl.exe (19.29).

All compilers generates identical code for both of direct cast and cast via union. Also clang-3.9 seems happy.

Since this is a compiler's bug, I hope we'll remove this workaround someday 🤞


All unions are trivial. But I put them for just in case.

// SSE2
union {
    XXH_ALIGN(64) const xxh_u8* cu8p;
    XXH_ALIGN(64) const float* cfp;
} tmp_union_for_cast;

// AVX2
union {
    XXH_ALIGN(64) const xxh_u8* cu8p;
    XXH_ALIGN(64) const __m256i* cm256ip;
} tmp_union_for_cast;

// AVX512
union {
    XXH_ALIGN(64) const xxh_u8* cu8p;
    XXH_ALIGN(64) const __m512i* cm512ip;
} tmp_union_for_cast;

(edit) I'm confusing.

Does this union really need XXH_ALIGN(64) for each member? It declares and requires the address (offset) of pointer itself. It guarantees

(uintptr_t) ( & tmp_union_for_cast.cu8p) % 64 == 0  // OK

But it doesn't mean

(uintptr_t) (tmp_union_for_cast.cu8p) % 64 == 0  // ?

Also is XXH_ALIGN(64) const float* const src working as intended? I think it expects src % 64 == 0 is always true. It doesn't mean to (&src) % 64 == 0.

t-mat commented 3 years ago

I think we should remove unnecessary XXH_ALIGN() from pointers. I mean

XXH_ALIGN(64) const float* const src = ...;

should be

const float* const src = ...;

We can test the effect of alignas() with the following code.

#include <stdio.h>
#include <stdint.h>

#if defined(_MSC_VER)
#  define XXH_NO_INLINE static __declspec(noinline)
#elif defined(__GNUC__)
#  include <stdalign.h>
#  define XXH_NO_INLINE static __attribute__((noinline))
#else
#  include <stdalign.h>
#  define XXH_NO_INLINE static
#endif

XXH_NO_INLINE void f() {
    alignas(64) char* p0 = NULL;
    alignas(64) char* p1 = NULL;
    alignas(64) char* p2 = NULL;
    printf("&p0 %% 64 = %2zd, (&p0=%p)\n", ((uintptr_t) &p0) % 64, &p0);
    printf("&p1 %% 64 = %2zd, (&p1=%p)\n", ((uintptr_t) &p1) % 64, &p1);
    printf("&p2 %% 64 = %2zd, (&p2=%p)\n", ((uintptr_t) &p2) % 64, &p2);
}

XXH_NO_INLINE void g() {
    char* p0 = NULL;
    char* p1 = NULL;
    char* p2 = NULL;
    printf("&p0 %% 64 = %2zd, (&p0=%p)\n", ((uintptr_t) &p0) % 64, &p0);
    printf("&p1 %% 64 = %2zd, (&p1=%p)\n", ((uintptr_t) &p1) % 64, &p1);
    printf("&p2 %% 64 = %2zd, (&p2=%p)\n", ((uintptr_t) &p2) % 64, &p2);
}

int main(int argc, char** argv) {
    printf("f() (alignas(64))\n");
    f();
    printf("\n");

    printf("g() (no alignas())\n");
    g();
    printf("\n");

    return 0;
}

Typical output is something like this:

f() (alignas(64))
&p0 % 64 =  0, (&p0=0x7fff16765b00)
&p1 % 64 =  0, (&p1=0x7fff16765b40)
&p2 % 64 =  0, (&p2=0x7fff16765b80)

g() (no alignas())
&p0 % 64 = 56, (&p0=0x7fff16765bb8)
&p1 % 64 =  0, (&p1=0x7fff16765bc0)
&p2 % 64 =  8, (&p2=0x7fff16765bc8)

In f(), since we specify alignas() for each pointer, address of pointers (&p) are 64-bytes aligned. It also means it consumes more stack space for alignment. Since we're inlining almost everything, it'd be good to remove unnecessary XXH_ALIGN()s to reduce stack allocation size.

Cyan4973 commented 3 years ago

The way I see it, this code, using const float*, is an ill-attempt at circumventing a strange Intel API limitation.

Both _mm_load_si128() and _mm_loadu_si128() require a const __m128i* pointer. Such a pointer is supposed aligned on 16-bytes boundaries. But, in the case of _mm_loadu_si128(), it simply doesn't make sense, since this function is specifically designed for unaligned source pointer. So it should have been const void*. Because of this API design error, compilers will complain about type error when feeding the parameter with something else than a const __m128* pointer. And when manually specifying a cast, -Wcast-qual will complain about an enlarged alignment requirement.

I think the current circumvention attempt is pretty bad, because it obscures the intention of the code. Plus it merely replaces an alignment issue with another alignment issue.

Finally, in this case, the source pointer is aligned, since it's XXH3_kSecret, which is statically defined as aligned on 64-bytes boundaries. So we should safely use the aligned variant _mm_loadu_si128(). However, the compiler might not realize that the alignment restriction is correctly fulfilled.

I will have to check some alternatives.

t-mat commented 3 years ago

https://github.com/Cyan4973/xxHash/pull/569#discussion_r693357719

(1) I confirmed that all compilers (gcc, clang and cl.exe) generates same code for direct const __m128i * cast, const float * and union.

(2) Other obvious possible solution is introducing #pragma for clang-3.9 and earlier.

#if defined(__clang__) && (__clang_major__ < 4)
#  define XXH_CLANG_3_OR_EARLIER 1
#else
#  define XXH_CLANG_3_OR_EARLIER 0
#endif

// ...

#if XXH_CLANG_3_OR_EARLIER /* Disable -Wcast-align for clang-3.x and earlier versions */
#  pragma clang diagnostic push
#  pragma clang diagnostic ignored "-Wcast-align"
#endif

const __m128i * const src16  = (const __m128i *)XXH3_kSecret;

#if XXH_CLANG_3_OR_EARLIER
#  pragma clang diagnostic pop
#endif

Adding #pragmas for each cast is ugly and bulky. But it may ease to remove this workaround in future.

Cyan4973 commented 3 years ago

I selected to use void* as an intermediate type to circumvent alignment restrictions. It has worked well in my tests (testing all versions of clang from v3.5 to v7).

t-mat commented 3 years ago

I'm aware of __builtin_assume_aligned() and its friends. I don't know it works as intended for us or not. But it may tell our intention to the compiler.

gcc : __builtin_assume_aligned() This function returns its first argument, and allows the compiler to assume that the returned pointer is at least align bytes aligned.

clang : Builtin Functions Clang supports a number of builtin library functions with the same syntax as GCC, including things like ... __builtin_assume_aligned, ..., etc.

gcc : __attribute__((assume_aligned(N))) The assume_aligned attribute is used to tell the compiler that the function return value points to memory, where the returned pointer minimum alignment is given by the first argument.

C++20 : std::assume_aligned() Informs the implementation that the object ptr points to is aligned to at least N.

#define MY_ALIGNMENT 64

void* f(void* p)
    __attribute__((assume_aligned(MY_ALIGNMENT)))  // gcc, clang : attribute for return value
{
    void* p_gcc   = __builtin_assume_aligned(p, MY_ALIGNMENT);  // gcc, clang
    void* p_cpp20 = std::assume_aligned<MY_ALIGNMENT>(p);   // C++20
    ...
    return p_gcc;
}