abseil / abseil-cpp

Abseil Common Libraries (C++)
https://abseil.io
Apache License 2.0
14.9k stars 2.6k forks source link

abseil doesn't build with clang on windows due to missing AVX types #1759

Closed achernya closed 3 weeks ago

achernya commented 1 month ago

Describe the issue

On clang on Windows, absl/crc/internal/non_temporal_memcpy.h fails to build with errors

In file included from third-party\abseil-cpp\absl\crc\internal\crc_non_temporal_memcpy.cc:20:                                                                                      
[redacted]\third-party\abseil-cpp\absl\crc\internal\non_temporal_memcpy.h:147:5: error: unknown type name '__m256i'                                                          
    __m256i *dst_cacheline = reinterpret_cast<__m256i *>(d);                                                                                                                       
    ^                                                                                                                                                                              
[redacted]\third-party\abseil-cpp\absl\crc\internal\non_temporal_memcpy.h:147:47: error: unknown type name '__m256i'                                                         
    __m256i *dst_cacheline = reinterpret_cast<__m256i *>(d);                                                                                                                       
                                              ^                                                                                                                                    
[redacted]\third-party\abseil-cpp\absl\crc\internal\non_temporal_memcpy.h:148:11: error: unknown type name '__m256i'                                                         
    const __m256i *src_cacheline = reinterpret_cast<const __m256i *>(s);                                                                                                           
          ^                                                                                                                                                                        
[redacted]\third-party\abseil-cpp\absl\crc\internal\non_temporal_memcpy.h:148:59: error: unknown type name '__m256i'                                                         
    const __m256i *src_cacheline = reinterpret_cast<const __m256i *>(s);                                                                                                           
                                                          ^                                                                                                                        
[redacted]\third-party\abseil-cpp\absl\crc\internal\non_temporal_memcpy.h:149:62: error: use of undeclared identifier '__m256i'                                              
    constexpr int kOpsPerCacheLine = kCacheLineSize / sizeof(__m256i);                                                                                                             
                                                             ^                                                                                                                     
[redacted]\third-party\abseil-cpp\absl\crc\internal\non_temporal_memcpy.h:153:7: error: unknown type name '__m256i'                                                          
      __m256i temp1, temp2;                                                                                                                                                        
      ^                                                                                                                                                                            
[redacted]\third-party\abseil-cpp\absl\crc\internal\non_temporal_memcpy.h:155:7: error: use of undeclared identifier 'temp2'; did you mean 'temp1'?                          
      temp2 = _mm256_lddqu_si256(src_cacheline + 1);                                                                                                                               
      ^                                                                                                                                                                            
[redacted]\third-party\abseil-cpp\absl\crc\internal\non_temporal_memcpy.h:153:15: note: 'temp1' declared here                                                                
      __m256i temp1, temp2;                                                                                                                                                        
              ^                                                                                                                                                                    
[redacted]\third-party\abseil-cpp\absl\crc\internal\non_temporal_memcpy.h:157:46: error: use of undeclared identifier 'temp2'; did you mean 'exp2'?                          
      _mm256_stream_si256(dst_cacheline + 1, temp2);                                                                                                                               
                                             ^                                                                                                                                     
[redacted]\third-party\toolchains\visual_studio\14.29.30133\VC\Tools\MSVC\14.29.30133\include\cmath:651:16: note: 'exp2' declared here                                  
_GENERIC_MATH1(exp2)

It looks like clang picked up some headers from MSVC (although we are not running clang-cl.exe, but clang++.exe) https://github.com/abseil/abseil-cpp/blob/master/absl/crc/internal/non_temporal_memcpy.h#L124-L127 does and have __m256i defined.

If I add a !defined(_MSC_VER_) to the __SSE3__ clause, everything builds fine.

Steps to reproduce the problem

Attempt to build absl with clang 15 on Windows. (I suspect it affects newer clang as well, but I haven't checked.) Per my read of https://github.com/google/oss-policies-info/blob/main/foundational-cxx-support-matrix.md clang-cl 15 should still be supported.

The invocation does not have any special clang-cl command line, i.e., no equivalent of -mavx or -march=native or similar -- just the usual -c, -o, -MD, -MF args.

What version of Abseil are you using?

20240722.0 (4447c7562e3bc702ade25105912dce503f0c4010)

What operating system and version are you using?

CentOS Stream release 9 (although buck2 is running some parts of the build on Windows)

What compiler and version are you using?

llvm 15.0.5 (windows-x86_64 clang++.exe)

What build system are you using?

buck2

Additional context

No response

derekmauro commented 1 month ago

Steps to reproduce the problem

Attempt to build absl with clang 15 on Windows. (I suspect it affects newer clang as well, but I haven't checked.) Per my read of https://github.com/google/oss-policies-info/blob/main/foundational-cxx-support-matrix.md clang-cl 15 should still be supported.

The invocation does not have any special clang-cl command line, i.e., no equivalent of -mavx or -march=native or similar -- just the usual -c, -o, -MD, -MF args.

We do test this configuration and do not see this issue, so there must be some difference that isn't getting captured here. Please provide concrete reproduction steps using a supported build system.

achernya commented 1 month ago

I'll look into getting a repro, which will be challenging since I don't have direct access to the Windows builders.

Can you help me understand why you believe the current #if defined(...) are correct? 256-bit registers are only available starting AVX, not SSE3, so even with the comment above the check, it doesn't look right to me.

derekmauro commented 1 month ago

I think I understand what is going on here. 18018aa45dd40b123d0c86224bbcc26fa3269d73 tries to force that function to compile with AVX (see the gnu::target line above the function) so that runtime dispatch can be used when AVX is available, but to also allow the same binary to work with CPUs that do not support AVX. I believe the #ifdef needs to be modified to verify that the compiler supports gnu::target.

achernya commented 1 month ago

Okay, reproduction steps:

  1. Get a WIndows 11 machine. I used a VM from https://developer.microsoft.com/en-us/windows/downloads/virtual-machines/. I set up my VM to have an EPYC CPU (because my host is AMD based).
  2. Install llvm-15.0.7-win64.exe from https://github.com/llvm/llvm-project/releases/tag/llvmorg-15.0.7, and install it, selecting "add to PATH for all users"
  3. Install cmake-3.30.3-windows-x86_64.msi from https://cmake.org/download/ and install it
  4. Download ninja-win.zip from https://github.com/ninja-build/ninja/releases and extract it. I put it in C:\Users\User\Documents
  5. Install Visual Studio Community 2019 with the C++ workload. (This is technically out of support, but only the Windows headers are needed here. I believe this will work fine with 2022, but I haven't tried it yet since I am trying to reproduce the environment that found the error)
  6. Clone abseil-cpp.git and checkout the 20240722.0 tag.
  7. In cmd.exe, run call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x64
  8. mkdir build && cd build
  9. cmake -DCMAKE_MAKE_PROGRAM:PATH=C:\Users\User\Documents\ninja.exe -DCMAKE_CXX_COMPILER:PATH="C:\Program Files\LLVM\bin\clang++.exe" -DCMAKE_CXX_FLAGS="-march=westmere" -G Ninja ..
  10. cmake --build . The build will fail, although the error message won't be as obvious. I recommend re-running with cmake --build . -- -j1 to get it.
C:\Users\User\Documents\src\abseil-cpp\build>cmake --build . -- -j1
[1/108] Building CXX object absl/crc/CMakeFiles/crc32c.dir/internal/crc_non_temporal_memcpy.cc.obj
FAILED: absl/crc/CMakeFiles/crc32c.dir/internal/crc_non_temporal_memcpy.cc.obj
C:\PROGRA~1\LLVM\bin\CLANG_~1.EXE  -IC:/Users/User/Documents/src/abseil-cpp -march=westmere -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -Wall -Wextra -Wc++98-compat-extra-semi -Wcast-qual -Wconversion -Wdeprecated-pragma -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wfor-loop-analysis -Wformat-security -Wgnu-redeclared-enum -Winfinite-recursion -Winvalid-constexpr -Wliteral-conversion -Wmissing-declarations -Woverlength-strings -Wpointer-arith -Wself-assign -Wshadow-all -Wshorten-64-to-32 -Wsign-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-zero-compare -Wundef -Wuninitialized -Wunreachable-code -Wunused-comparison -Wunused-local-typedefs -Wunused-result -Wvla -Wwrite-strings -Wno-float-conversion -Wno-implicit-float-conversion -Wno-implicit-int-float-conversion -Wno-unknown-warning-option -DNOMINMAX -MD -MT absl/crc/CMakeFiles/crc32c.dir/internal/crc_non_temporal_memcpy.cc.obj -MF absl\crc\CMakeFiles\crc32c.dir\internal\crc_non_temporal_memcpy.cc.obj.d -o absl/crc/CMakeFiles/crc32c.dir/internal/crc_non_temporal_memcpy.cc.obj -c C:/Users/User/Documents/src/abseil-cpp/absl/crc/internal/crc_non_temporal_memcpy.cc
In file included from C:/Users/User/Documents/src/abseil-cpp/absl/crc/internal/crc_non_temporal_memcpy.cc:20:
C:/Users/User/Documents/src/abseil-cpp\absl/crc/internal/non_temporal_memcpy.h:147:5: error: unknown type name '__m256i'
    __m256i *dst_cacheline = reinterpret_cast<__m256i *>(d);
    ^
C:/Users/User/Documents/src/abseil-cpp\absl/crc/internal/non_temporal_memcpy.h:147:47: error: unknown type name '__m256i'
    __m256i *dst_cacheline = reinterpret_cast<__m256i *>(d);
                                              ^
C:/Users/User/Documents/src/abseil-cpp\absl/crc/internal/non_temporal_memcpy.h:148:11: error: unknown type name '__m256i'
    const __m256i *src_cacheline = reinterpret_cast<const __m256i *>(s);
          ^
C:/Users/User/Documents/src/abseil-cpp\absl/crc/internal/non_temporal_memcpy.h:148:59: error: unknown type name '__m256i'
    const __m256i *src_cacheline = reinterpret_cast<const __m256i *>(s);
                                                          ^
C:/Users/User/Documents/src/abseil-cpp\absl/crc/internal/non_temporal_memcpy.h:149:62: error: use of undeclared identifier '__m256i'
    constexpr int kOpsPerCacheLine = kCacheLineSize / sizeof(__m256i);
                                                             ^
C:/Users/User/Documents/src/abseil-cpp\absl/crc/internal/non_temporal_memcpy.h:153:7: error: unknown type name '__m256i'
      __m256i temp1, temp2;
      ^
C:/Users/User/Documents/src/abseil-cpp\absl/crc/internal/non_temporal_memcpy.h:155:7: error: use of undeclared identifier 'temp2'; did you mean 'temp1'?
      temp2 = _mm256_lddqu_si256(src_cacheline + 1);
      ^~~~~
      temp1
C:/Users/User/Documents/src/abseil-cpp\absl/crc/internal/non_temporal_memcpy.h:153:15: note: 'temp1' declared here
      __m256i temp1, temp2;
              ^
C:/Users/User/Documents/src/abseil-cpp\absl/crc/internal/non_temporal_memcpy.h:157:46: error: use of undeclared identifier 'temp2'; did you mean 'exp2'?
      _mm256_stream_si256(dst_cacheline + 1, temp2);
                                             ^~~~~
                                             exp2
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include\cmath:651:16: note: 'exp2' declared here
_GENERIC_MATH1(exp2)
               ^
8 errors generated.
ninja: build stopped: subcommand failed.

The -march=westmere is load-bearing -- without it, __SSE3__ never gets set by clang, and never exposes this issue. So my comment in the initial report about there being no -march or similar flags must be in error (there's possibly something in the clang wrapper script by buck, or LLVM was built to assume a different CPU)

Either way, this reproduces with a supported build-system, and a fairly reasonable -march CPU variant.

achernya commented 1 month ago

I believe the #ifdef needs to be modified to verify that the compiler supports gnu::target

If I put #error Supports gnu::target on line 119, my reproducer gets

C:/Users/User/Documents/src/abseil-cpp\absl/crc/internal/non_temporal_memcpy.h:119:2: error: Supports gnu::target
#error Supports gnu::target

So apparently clang++.exe supports gnu::target, but there's still no __m256i defined in that case.

Looking at clang 15's immintrin.h, it has

#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) ||      \
    defined(__AVX__)
#include <avxintrin.h>
#endif

I checked to see if clang++.exe is defining _MSC_VER by adding

#if defined(_MSC_VER)
#error "Windows?"
#endif

and I do get this error.

So avxintrin.h (which is what has the m256i typedefs) never gets included without `AVX__` on Windows, but does get included without Windows, which is why this code compiles on Linux (and presumably macos) with -march=westmere.

achernya commented 1 month ago

Looking at immintrin.h from gcc 12 as well as LLVM 15, it looks like gcc unconditionally includes avxintrin.h and makes the typedefs available.

LLVM makes avxintrin.h available if __AVX__ is defined, or _MSC_VER is not defined.

So I think it's likely sufficient to change from

#if defined(__SSE3__) || (defined(_MSC_VER) && defined(__AVX__))

to

#if  !defined(_MSC_VER) || defined(__AVX__)

(I don't know if we need an explicit gcc check with (defined(__GNUC__) && !defined(__llvm__)))

derekmauro commented 3 weeks ago

Thanks. I managed to reproduce. I'll try to submit a fix in the next few days.

achernya commented 3 weeks ago

Thanks for the fix, @derekmauro!

However, I think

#if ((defined(__AVX__) || defined(ABSL_INTERNAL_CAN_FORCE_AVX)) && \
     defined(__SSE3__)) ||                                         \
    (defined(_MSC_VER) && defined(__AVX__))

should be simplified to something similar to what I suggested above. Probably

#if defined(__AVX__) || defined(ABSL_INTERNAL_CAN_FORCE_AVX)

given how you defined ABSL_INTERNAL_CAN_FORCE_AVX to already include !defined(_MSC_VER). Primarily the issue here is we need to remove __SSE3__ as it's misleading and confusing.