OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.14k stars 1.46k forks source link

Support building with flang on windows #4768

Open h-vetinari opened 1 week ago

h-vetinari commented 1 week ago

OpenBLAS already added flang support, but I don't think this is being tested on windows? While reviving the old effort to build conda-forge's openblas with flang, I originally ran into some parsing issue with flang 18.

Luckily, with a flang 19 built from main (already built for debugging something else, so I thought I'd try), it seems that particular issue is gone. 🥳

However, I first encountered some CMake detection issues:

-- Found OpenMP_C: -Xclang -fopenmp (found version "5.1")
CMake Error at D:/bld/openblas_1719527807775/_build_env/Library/share/cmake-3.29/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
-- Configuring incomplete, errors occurred!
  Could NOT find OpenMP_Fortran (missing: OpenMP_Fortran_FLAGS
  OpenMP_Fortran_LIB_NAMES)

After iteratively figuring out (also re-encountering #3069 again along the way) that I needed to add (something like)

    -DOpenMP_Fortran_FLAGS=-fopenmp ^
    -DOpenMP_Fortran_LIB_NAMES=libomp ^
    -DOpenMP_libomp_LIBRARY=-llibomp ^
    -DOpenMP_C_FLAGS=-fopenmp ^
    -DOpenMP_C_LIB_NAMES=libomp ^

I then ran into what looks like a regular compilation error:

[...]
[3643/19184] Building Fortran object CMakeFiles\LAPACK_OVERRIDES.dir\lapack-netlib\SRC\stplqt.f.obj
[3644/19184] Building Fortran object CMakeFiles\LAPACK_OVERRIDES.dir\lapack-netlib\SRC\stplqt2.f.obj
[3645/19184] Building Fortran object CMakeFiles\LAPACK_OVERRIDES.dir\lapack-netlib\SRC\stpmlqt.f.obj
[3646/19184] Building Fortran object CMakeFiles\LAPACK_OVERRIDES.dir\lapack-netlib\SRC\ssytrd_2stage.f.obj
[3647/19184] Building Fortran object CMakeFiles\LAPACK_OVERRIDES.dir\lapack-netlib\SRC\ssytrd_sb2st.F.obj
FAILED: CMakeFiles/LAPACK_OVERRIDES.dir/lapack-netlib/SRC/ssytrd_sb2st.F.obj 
%BUILD_PREFIX%\Library\bin\flang.exe -I%SRC_DIR%\lapack-netlib\SRC -I%SRC_DIR%\lapack-netlib\LAPACKE\include -fopenmp -fopenmp -ffixed-line-length-72 -o CMakeFiles\LAPACK_OVERRIDES.dir\lapack-netlib\SRC\ssytrd_sb2st.F.obj -c CMakeFiles\LAPACK_OVERRIDES.dir\lapack-netlib\SRC\ssytrd_sb2st.F-pp.f
error: Semantic errors in CMakeFiles\LAPACK_OVERRIDES.dir\lapack-netlib\SRC\ssytrd_sb2st.F-pp.f
D:\\bld\\openblas_1719536014536\\work\\lapack-netlib\\SRC\\ssytrd_sb2st.F:237:11: error: Cannot read module file for module 'omp_lib': Source file 'omp_lib.mod' was not found
        use omp_lib
            ^^^^^^^
martin-frbg commented 1 week ago

right, flang on Windows appears to be lagging behind the Linux/Unix version. If there is no omp_lib module provided by LLVM, I suggest you open an issue with them (unless already known/documented). This may also be the reason why you needed to set a bunch of cmake variables manually

martin-frbg commented 1 week ago

but ISTR broken OpenMP support in LLVM on Windows is a known problem, and your "flang 19" is an unstable snapshot

h-vetinari commented 1 week ago

but ISTR broken OpenMP support in LLVM on Windows is a known problem

Do you have a link?

and your "flang 19" is an unstable snapshot

Yes, 19.1.0rc1 is only expected in about a month. The problem was that all flang 18 builds were broken for SciPy, and I wanted to test/ensure that things work with flang 19 (early enough for potentially necessary fixes to land), hence why I built from main, as noted in the OP.

martin-frbg commented 1 week ago

somewhere in the discussion in #3973 I think, but it could be that it was broken once, worked for a while and is now broken again.

h-vetinari commented 1 week ago

It seems upstream intends to support it. I'm trying to rebuild as necessary to test that hypothesis.

Meanwhile, I've had one passing run without OpenMP, however, the logs got spammed so badly with warnings (~500MB), that I cannot really check them. After running again with warnings ignored, I get:

97% tests passed, 4 tests failed out of 120
Errors while running CTest
Output from these tests are in: D:/bld/openblas_1719572917500/work/build/Testing/Temporary/LastTest.log

Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
Total Test time (real) = 9104.76 sec

The following tests FAILED:
      5 - sblas3 (Failed)
      8 - dblas3 (Failed)
     11 - cblas3 (Failed)
     15 - zblas3 (Failed)

The runtime especially of complex-valued procedures is off the charts. That's also the part that depends on LLVM's compiler-rt (instead of MSVC's runtime); not sure if that plays a role somehow. An excerpt:

 14/120 Test  #15: zblas3 ......................................................................................***Failed    5.13 sec
        Start  16: zblas3_3m
 15/120 Test  #16: zblas3_3m ...................................................................................   Passed    0.75 sec
        Start  17: REAL_LAPACK_linear_equation_routines
 16/120 Test  #14: zblas2 ......................................................................................   Passed  202.79 sec
        Start  18: COMPLEX_LAPACK_linear_equation_routines
 17/120 Test  #17: REAL_LAPACK_linear_equation_routines ........................................................   Passed  2937.14 sec
        Start  19: DOUBLE_PRECISION_LAPACK_linear_equation_routines
 18/120 Test  #18: COMPLEX_LAPACK_linear_equation_routines .....................................................   Passed  4390.90 sec
        Start  20: COMPLEX16_LAPACK_linear_equation_routines
 19/120 Test  #19: DOUBLE_PRECISION_LAPACK_linear_equation_routines ............................................   Passed  3293.06 sec
        Start  21: SINGLE-DOUBLE_PRECISION_LAPACK_prototype_linear_equation_routines
 20/120 Test  #21: SINGLE-DOUBLE_PRECISION_LAPACK_prototype_linear_equation_routines ...........................   Passed  178.72 sec
        Start  22: Testing_COMPLEX-COMPLEX16_LAPACK_prototype_linear_equation_routines
 21/120 Test  #22: Testing_COMPLEX-COMPLEX16_LAPACK_prototype_linear_equation_routines .........................   Passed  793.80 sec
        Start  23: Testing_REAL_LAPACK_RFP_prototype_linear_equation_routines
 22/120 Test  #23: Testing_REAL_LAPACK_RFP_prototype_linear_equation_routines ..................................   Passed   57.25 sec
        Start  24: Testing_DOUBLE_PRECISION_LAPACK_RFP_prototype_linear_equation_routines
 23/120 Test  #24: Testing_DOUBLE_PRECISION_LAPACK_RFP_prototype_linear_equation_routines ......................   Passed  115.00 sec
        Start  25: Testing_COMPLEX_LAPACK_RFP_prototype_linear_equation_routines
 24/120 Test  #25: Testing_COMPLEX_LAPACK_RFP_prototype_linear_equation_routines ...............................   Passed  338.11 sec
        Start  26: Testing_COMPLEX16_LAPACK_RFP_prototype_linear_equation_routines
 25/120 Test  #26: Testing_COMPLEX16_LAPACK_RFP_prototype_linear_equation_routines .............................   Passed  283.97 sec
        Start  27: SNEP:_Testing_Nonsymmetric_Eigenvalue_Problem_routines
 26/120 Test  #27: SNEP:_Testing_Nonsymmetric_Eigenvalue_Problem_routines ......................................   Passed    1.00 sec
        Start  28: SSEP:_Testing_Symmetric_Eigenvalue_Problem_routines
 27/120 Test  #28: SSEP:_Testing_Symmetric_Eigenvalue_Problem_routines .........................................   Passed    0.70 sec
        Start  29: SSE2:_Testing_Symmetric_Eigenvalue_Problem_routines
 28/120 Test  #29: SSE2:_Testing_Symmetric_Eigenvalue_Problem_routines .........................................   Passed    1.89 sec
        Start  30: SSVD:_Testing_Singular_Value_Decomposition_routines
 29/120 Test  #30: SSVD:_Testing_Singular_Value_Decomposition_routines .........................................   Passed    0.71 sec
        Start  31: SSEC:_Testing_REAL_Eigen_Condition_Routines
 30/120 Test  #31: SSEC:_Testing_REAL_Eigen_Condition_Routines .................................................   Passed    0.60 sec
        Start  32: SSEV:_Testing_REAL_Nonsymmetric_Eigenvalue_Driver
h-vetinari commented 1 week ago

FWIW, the time to run ctest -j2 on our current builds is:

100% tests passed, 0 tests failed out of 120

Total Test time (real) = 6305.40 sec

So flang takes about 50% more time.

h-vetinari commented 1 week ago

on a rerun of the flang-built openblas (without openmp), I now get:

100% tests passed, 0 tests failed out of 120

Total Test time (real) = 8278.32 sec

My suspicion is that the CI agents switching between different CPU types randomly is the cause for the pass or fail. If necessary, I can try to validate that hypothesis.

martin-frbg commented 1 week ago

maybe alternating between avx512 and non-avx512 , common problem with azure-ci and ISTR mmuetzel had noted problems with win-llvm and avx512 previously

martin-frbg commented 1 week ago

omp mod problem could also be missing include path to modules in llvm install path. will try to look at your logs later if able

h-vetinari commented 1 week ago

It fails with the following instructions found for the CPU of the CI agent (according to numpy):

    "found": [
      "SSSE3",
      "SSE41",
      "POPCNT",
      "SSE42",
      "AVX",
      "F16C",
      "FMA3",
      "AVX2",
      "AVX512F",
      "AVX512CD",
      "AVX512_SKX"
    ],
    "not found": [
      "AVX512_CLX",
      "AVX512_CNL",
      "AVX512_ICL"
    ]

I managed to fix the omp_lib.mod problem, the openmp-enabled build now fails with:

[17746/19184] Linking Fortran shared library lib\openblas.dll
FAILED: lib/openblas.dll lib/Release/openblas.lib 
C:\Windows\system32\cmd.exe /C "C:\Windows\system32\cmd.exe /C "%BUILD_PREFIX%\Library\bin\cmake.exe -E __create_def %SRC_DIR%\build\CMakeFiles\openblas_shared.dir\.\exports.def %SRC_DIR%\build\CMakeFiles\openblas_shared.dir\.\exports.def.objs --nm=%BUILD_PREFIX%\Library\bin\llvm-nm.exe && cd %SRC_DIR%\build" && %BUILD_PREFIX%\Library\bin\cmake.exe -E vs_link_dll --intdir=CMakeFiles\openblas_shared.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\mt.exe --manifests  -- C:\PROGRA~1\LLVM\bin\lld-link.exe /nologo @CMakeFiles\openblas_shared.rsp  /out:lib\openblas.dll /implib:lib\Release\openblas.lib /pdb:lib\openblas.pdb /dll /version:0.3 /machine:x64 /INCREMENTAL:NO /DEBUG /OPT:REF /OPT:ICF  /DEF:CMakeFiles\openblas_shared.dir\.\exports.def  -libpath:"D:/bld/openblas_1719644456116/_build_env/Library/lib" -libpath:"D:/bld/openblas_1719644456116/_build_env/Library/lib/clang/19/lib/windows"  && cd ."
LINK: command "C:\PROGRA~1\LLVM\bin\lld-link.exe /nologo @CMakeFiles\openblas_shared.rsp /out:lib\openblas.dll /implib:lib\Release\openblas.lib /pdb:lib\openblas.pdb /dll /version:0.3 /machine:x64 /INCREMENTAL:NO /DEBUG /OPT:REF /OPT:ICF /DEF:CMakeFiles\openblas_shared.dir\.\exports.def -libpath:D:/bld/openblas_1719644456116/_build_env/Library/lib -libpath:D:/bld/openblas_1719644456116/_build_env/Library/lib/clang/19/lib/windows /MANIFEST:EMBED,ID=2" failed (exit code 1) with the following output:
lld-link: warning: ignoring unknown argument '-lpthreads'
lld-link: error: undefined symbol: omp_get_max_threads
martin-frbg commented 1 week ago

The lpthreads argument error is probably bogus. Do you see -fopenmp or -lomp in the build log ? Do you see omp_get_max_threads in nm or dumpbin output of the llvm-provided libomp ?

h-vetinari commented 1 week ago

Passing configuration (non-openmp):

    "found": [
      "SSSE3",
      "SSE41",
      "POPCNT",
      "SSE42",
      "AVX",
      "F16C",
      "FMA3",
      "AVX2"
    ],
    "not found": [
      "AVX512F",
      "AVX512CD",
      "AVX512_SKX",
      "AVX512_CLX",
      "AVX512_CNL",
      "AVX512_ICL"
    ]

So indeed some AVX512 issue seems likely

h-vetinari commented 1 week ago

Do you see -fopenmp or -lomp in the build log ?

Yeah, I'm even passing that explicitly in -DOpenMP_Fortran_FLAGS=-fopenmp. To be honest, I have no idea where the -lpthreads is coming from.

martin-frbg commented 1 week ago

seen some google hits suggesting that it is simply fallout from one of cmake's standard configuration check scripts and can be ignored. but omp_get_max_threads should definitely be in llvm's libomp (or at least used to be in earlier versions)

h-vetinari commented 1 week ago

Looking closer, I'm almost certain this is a question of the wrong library path being picked:

-libpath:"D:/bld/openblas_1719644456116/_build_env/Library/lib" -libpath:"D:/bld/openblas_1719644456116/_build_env/Library/lib/clang/19/lib/windows"

Both of these are pointing to the build environment, not the host where openmp is actually present. Perhaps the path is constructed relative to clang/flang?