Open h-vetinari opened 5 months ago
right, flang on Windows appears to be lagging behind the Linux/Unix version. If there is no omp_lib module provided by LLVM, I suggest you open an issue with them (unless already known/documented). This may also be the reason why you needed to set a bunch of cmake variables manually
but ISTR broken OpenMP support in LLVM on Windows is a known problem, and your "flang 19" is an unstable snapshot
but ISTR broken OpenMP support in LLVM on Windows is a known problem
Do you have a link?
and your "flang 19" is an unstable snapshot
Yes, 19.1.0rc1 is only expected in about a month. The problem was that all flang 18 builds were broken for SciPy, and I wanted to test/ensure that things work with flang 19 (early enough for potentially necessary fixes to land), hence why I built from main, as noted in the OP.
somewhere in the discussion in #3973 I think, but it could be that it was broken once, worked for a while and is now broken again.
It seems upstream intends to support it. I'm trying to rebuild as necessary to test that hypothesis.
Meanwhile, I've had one passing run without OpenMP, however, the logs got spammed so badly with warnings (~500MB), that I cannot really check them. After running again with warnings ignored, I get:
97% tests passed, 4 tests failed out of 120
Errors while running CTest
Output from these tests are in: D:/bld/openblas_1719572917500/work/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
Total Test time (real) = 9104.76 sec
The following tests FAILED:
5 - sblas3 (Failed)
8 - dblas3 (Failed)
11 - cblas3 (Failed)
15 - zblas3 (Failed)
The runtime especially of complex-valued procedures is off the charts. That's also the part that depends on LLVM's compiler-rt (instead of MSVC's runtime); not sure if that plays a role somehow. An excerpt:
14/120 Test #15: zblas3 ......................................................................................***Failed 5.13 sec
Start 16: zblas3_3m
15/120 Test #16: zblas3_3m ................................................................................... Passed 0.75 sec
Start 17: REAL_LAPACK_linear_equation_routines
16/120 Test #14: zblas2 ...................................................................................... Passed 202.79 sec
Start 18: COMPLEX_LAPACK_linear_equation_routines
17/120 Test #17: REAL_LAPACK_linear_equation_routines ........................................................ Passed 2937.14 sec
Start 19: DOUBLE_PRECISION_LAPACK_linear_equation_routines
18/120 Test #18: COMPLEX_LAPACK_linear_equation_routines ..................................................... Passed 4390.90 sec
Start 20: COMPLEX16_LAPACK_linear_equation_routines
19/120 Test #19: DOUBLE_PRECISION_LAPACK_linear_equation_routines ............................................ Passed 3293.06 sec
Start 21: SINGLE-DOUBLE_PRECISION_LAPACK_prototype_linear_equation_routines
20/120 Test #21: SINGLE-DOUBLE_PRECISION_LAPACK_prototype_linear_equation_routines ........................... Passed 178.72 sec
Start 22: Testing_COMPLEX-COMPLEX16_LAPACK_prototype_linear_equation_routines
21/120 Test #22: Testing_COMPLEX-COMPLEX16_LAPACK_prototype_linear_equation_routines ......................... Passed 793.80 sec
Start 23: Testing_REAL_LAPACK_RFP_prototype_linear_equation_routines
22/120 Test #23: Testing_REAL_LAPACK_RFP_prototype_linear_equation_routines .................................. Passed 57.25 sec
Start 24: Testing_DOUBLE_PRECISION_LAPACK_RFP_prototype_linear_equation_routines
23/120 Test #24: Testing_DOUBLE_PRECISION_LAPACK_RFP_prototype_linear_equation_routines ...................... Passed 115.00 sec
Start 25: Testing_COMPLEX_LAPACK_RFP_prototype_linear_equation_routines
24/120 Test #25: Testing_COMPLEX_LAPACK_RFP_prototype_linear_equation_routines ............................... Passed 338.11 sec
Start 26: Testing_COMPLEX16_LAPACK_RFP_prototype_linear_equation_routines
25/120 Test #26: Testing_COMPLEX16_LAPACK_RFP_prototype_linear_equation_routines ............................. Passed 283.97 sec
Start 27: SNEP:_Testing_Nonsymmetric_Eigenvalue_Problem_routines
26/120 Test #27: SNEP:_Testing_Nonsymmetric_Eigenvalue_Problem_routines ...................................... Passed 1.00 sec
Start 28: SSEP:_Testing_Symmetric_Eigenvalue_Problem_routines
27/120 Test #28: SSEP:_Testing_Symmetric_Eigenvalue_Problem_routines ......................................... Passed 0.70 sec
Start 29: SSE2:_Testing_Symmetric_Eigenvalue_Problem_routines
28/120 Test #29: SSE2:_Testing_Symmetric_Eigenvalue_Problem_routines ......................................... Passed 1.89 sec
Start 30: SSVD:_Testing_Singular_Value_Decomposition_routines
29/120 Test #30: SSVD:_Testing_Singular_Value_Decomposition_routines ......................................... Passed 0.71 sec
Start 31: SSEC:_Testing_REAL_Eigen_Condition_Routines
30/120 Test #31: SSEC:_Testing_REAL_Eigen_Condition_Routines ................................................. Passed 0.60 sec
Start 32: SSEV:_Testing_REAL_Nonsymmetric_Eigenvalue_Driver
FWIW, the time to run ctest -j2
on our current builds is:
100% tests passed, 0 tests failed out of 120
Total Test time (real) = 6305.40 sec
So flang takes about 50% more time.
on a rerun of the flang-built openblas (without openmp), I now get:
100% tests passed, 0 tests failed out of 120
Total Test time (real) = 8278.32 sec
My suspicion is that the CI agents switching between different CPU types randomly is the cause for the pass or fail. If necessary, I can try to validate that hypothesis.
maybe alternating between avx512 and non-avx512 , common problem with azure-ci and ISTR mmuetzel had noted problems with win-llvm and avx512 previously
omp mod problem could also be missing include path to modules in llvm install path. will try to look at your logs later if able
It fails with the following instructions found for the CPU of the CI agent (according to numpy):
"found": [
"SSSE3",
"SSE41",
"POPCNT",
"SSE42",
"AVX",
"F16C",
"FMA3",
"AVX2",
"AVX512F",
"AVX512CD",
"AVX512_SKX"
],
"not found": [
"AVX512_CLX",
"AVX512_CNL",
"AVX512_ICL"
]
I managed to fix the omp_lib.mod
problem, the openmp-enabled build now fails with:
[17746/19184] Linking Fortran shared library lib\openblas.dll
FAILED: lib/openblas.dll lib/Release/openblas.lib
C:\Windows\system32\cmd.exe /C "C:\Windows\system32\cmd.exe /C "%BUILD_PREFIX%\Library\bin\cmake.exe -E __create_def %SRC_DIR%\build\CMakeFiles\openblas_shared.dir\.\exports.def %SRC_DIR%\build\CMakeFiles\openblas_shared.dir\.\exports.def.objs --nm=%BUILD_PREFIX%\Library\bin\llvm-nm.exe && cd %SRC_DIR%\build" && %BUILD_PREFIX%\Library\bin\cmake.exe -E vs_link_dll --intdir=CMakeFiles\openblas_shared.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\mt.exe --manifests -- C:\PROGRA~1\LLVM\bin\lld-link.exe /nologo @CMakeFiles\openblas_shared.rsp /out:lib\openblas.dll /implib:lib\Release\openblas.lib /pdb:lib\openblas.pdb /dll /version:0.3 /machine:x64 /INCREMENTAL:NO /DEBUG /OPT:REF /OPT:ICF /DEF:CMakeFiles\openblas_shared.dir\.\exports.def -libpath:"D:/bld/openblas_1719644456116/_build_env/Library/lib" -libpath:"D:/bld/openblas_1719644456116/_build_env/Library/lib/clang/19/lib/windows" && cd ."
LINK: command "C:\PROGRA~1\LLVM\bin\lld-link.exe /nologo @CMakeFiles\openblas_shared.rsp /out:lib\openblas.dll /implib:lib\Release\openblas.lib /pdb:lib\openblas.pdb /dll /version:0.3 /machine:x64 /INCREMENTAL:NO /DEBUG /OPT:REF /OPT:ICF /DEF:CMakeFiles\openblas_shared.dir\.\exports.def -libpath:D:/bld/openblas_1719644456116/_build_env/Library/lib -libpath:D:/bld/openblas_1719644456116/_build_env/Library/lib/clang/19/lib/windows /MANIFEST:EMBED,ID=2" failed (exit code 1) with the following output:
lld-link: warning: ignoring unknown argument '-lpthreads'
lld-link: error: undefined symbol: omp_get_max_threads
The lpthreads argument error is probably bogus. Do you see -fopenmp
or -lomp
in the build log ? Do you see omp_get_max_threads in nm
or dumpbin
output of the llvm-provided libomp ?
Passing configuration (non-openmp):
"found": [
"SSSE3",
"SSE41",
"POPCNT",
"SSE42",
"AVX",
"F16C",
"FMA3",
"AVX2"
],
"not found": [
"AVX512F",
"AVX512CD",
"AVX512_SKX",
"AVX512_CLX",
"AVX512_CNL",
"AVX512_ICL"
]
So indeed some AVX512 issue seems likely
Do you see
-fopenmp
or-lomp
in the build log ?
Yeah, I'm even passing that explicitly in -DOpenMP_Fortran_FLAGS=-fopenmp
. To be honest, I have no idea where the -lpthreads
is coming from.
seen some google hits suggesting that it is simply fallout from one of cmake's standard configuration check scripts and can be ignored. but omp_get_max_threads should definitely be in llvm's libomp (or at least used to be in earlier versions)
Looking closer, I'm almost certain this is a question of the wrong library path being picked:
-libpath:"D:/bld/openblas_1719644456116/_build_env/Library/lib" -libpath:"D:/bld/openblas_1719644456116/_build_env/Library/lib/clang/19/lib/windows"
Both of these are pointing to the build environment, not the host where openmp is actually present. Perhaps the path is constructed relative to clang/flang?
Two good news, two bad news
#1
I managed to build openblas in conda-forge with flang 19 and the following patch:
commit 4cab116c4e811f977c816f3027df57d1fba3bd0c
Author: H. Vetinari <h.vetinari@gmx.com>
Date: Mon Jul 1 09:05:00 2024 +1100
explicitly link to OpenMP
bump CMake minimum to adopt CMP0074 for checking OpenMP_ROOT, see
https://cmake.org/cmake/help/latest/policy/CMP0074.html
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 096ca88b3..ef505f70d 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -2,7 +2,7 @@
## Author: Hank Anderson <hank@statease.com>
##
-cmake_minimum_required(VERSION 2.8.5)
+cmake_minimum_required(VERSION 3.12)
project(OpenBLAS C ASM)
@@ -100,6 +100,10 @@ endif()
message(WARNING "CMake support is experimental. It does not yet support all build options and may not produce the same Makefiles that OpenBLAS ships with.")
+if (USE_OPENMP)
+ find_package(OpenMP REQUIRED)
+endif ()
+
include("${PROJECT_SOURCE_DIR}/cmake/utils.cmake")
include("${PROJECT_SOURCE_DIR}/cmake/system.cmake")
@@ -256,6 +260,15 @@ if (${CMAKE_SYSTEM_NAME} MATCHES "AIX|Android|Linux|FreeBSD|OpenBSD|NetBSD|Drago
endif()
endif()
+if (USE_OPENMP)
+ if(BUILD_STATIC_LIBS)
+ target_link_libraries(${OpenBLAS_LIBNAME}_static OpenMP::OpenMP_C)
+ endif()
+ if(BUILD_SHARED_LIBS)
+ target_link_libraries(${OpenBLAS_LIBNAME}_shared OpenMP::OpenMP_C)
+ endif()
+endif()
+
# Seems that this hack doesn't required since macOS 11 Big Sur
if (APPLE AND BUILD_SHARED_LIBS AND CMAKE_HOST_SYSTEM_VERSION VERSION_LESS 20)
set (CMAKE_C_USE_RESPONSE_FILE_FOR_OBJECTS 1)
This is to fix the linkage errors for symbols like omp_get_max_threads
mentioned above. I'm not claiming that the patch is directly upstreamable, but if you want I can open a PR.
#1
Finding LLVM's OpenMP correctly using CMake's find_package(OpenMP)
is apparently fraught. Even using the OpenMP_ROOT
work-around noted here, the Fortran-bits do not get picked up correctly, requiring a manual override:
REM not picked up by `find_package(OpenMP)` for some reason
set "CMAKE_EXTRA=-DOpenMP_Fortran_FLAGS=-fopenmp -DOpenMP_Fortran_LIB_NAMES=libomp"
set "FFLAGS=%FFLAGS% -I%LIBRARY_PREFIX%\include"
#2
The test suite is extraordinarily faster than without openmp:
Total Test time (real) = 98.31 sec
That's roughly 85x faster than the result we get for llvm-flang-built OpenBLAS without OpenMP (which is probably too good to be true, in the sense that the pthreads variant is probably being pessimized somewhere/somehow), and still ~65x faster than the current pthreads-based production builds; c.f. also https://github.com/conda-forge/openblas-feedstock/issues/160
#2
The same kind of failures as with pthreads occur on agents with AVX512 instructions:
97% tests passed, 4 tests failed out of 120
The following tests FAILED:
5 - sblas3 (Failed)
8 - dblas3 (Failed)
11 - cblas3 (Failed)
15 - zblas3 (Failed)
@h-vetinari how did you fix the omp_lib.mod problem back then ? I get the impression that this module is genuinely missing from the llvm 19.1.2and llvm-openmp packages currently on conda-forge (and same when specifying the path conda-forge/label/llvm_dev)
err, I have belatedly found your openmp-feedstock PR 134 that got merged three weeks ago - has no new package been released since then, or do I need a specific incancantation to install it from conda-forge ?
has no new package been released since then, or do I need a specific incancantation to install it from conda-forge ?
During the rc-phase the .mod
files were just part of llvm-openmp
itself, but we ended up separating those out into a separate output called llvm-openmp-fortran
. If you specify that in the host environment (or just install it if you're working locally), you should be good (our compiler activation automatically does -I%LIBRARY_PREFIX%\include
).
Thanks, got it installed now. Adding the include paths to the FFLAGS still seems to be necessary to get it picked up, will see how the build goes...
FYI: If you install flang_win-64
as opposed to flang
, you get the "activated" compiler that does set FFLAGS
and a few other things.
hmm, that appears to break the build for me
hmm, that appears to break the build for me
Here's what we do in the flang activation: https://github.com/conda-forge/flang-activation-feedstock/blob/main/recipe/activate.bat
If you just install flang
(+lld
), you can always set these flags manually if you'd like.
Thank you - actually not sure if what I'm seeing now is a problem with flang or the underlying Windows installation (this is a brand new AMD "strix point" laptop that shows some idiosyncrasies).Both ninja and makefile drivers suddenly fail by trying to link random objects that have not been compiled yet. System time appears to be sane though - I'll probably try reinstalling everything starting from miniforge...
Turns out I had a stray -I
in my FFLAGS that caused the special effects ðŸ˜
So I can confirm that OpenMP does not appear to cause additional accuracy errors with LLVM19.1 on Windows anymore. @h-vetinari would you like to prepare a PR for your OpenMP-related fixes in CMakeLists.txt ? The minimum_required has
already been adjusted in the meantime.
After that, only the linking of xerbla in the tests needs fixing, and the cause of the unresolved __imp_...
symbols when doing static linking. (The latter may well be a purely local problem with my installation of the VS2022 community edition)
@h-vetinari would you like to prepare a PR for your OpenMP-related fixes in CMakeLists.txt ?
happy to
After that, only the linking of xerbla in the tests needs fixing, and the cause of the unresolved
__imp_...
symbols when doing static linking.
Not sure if we're talking about the same problems, but I managed to build lapack (including tests & xerbla shenanigans) with flang. Needed two patches to CBLAS/testing/*
though.
thanks. I cannot reproduce the __impl
linking problems anymore so they may have been another aspect of my initially broken setup.
Meh, thought we had everything working with the conda-forge build of LLVM19 but the simple test code from #4886 crashes on my new Zen5 laptop unless I compile OpenBLAS for Haswell.
OpenBLAS already added flang support, but I don't think this is being tested on windows? While reviving the old effort to build conda-forge's
openblas
with flang, I originally ran into some parsing issue with flang 18.Luckily, with a flang 19 built from main (already built for debugging something else, so I thought I'd try), it seems that particular issue is gone. 🥳
However, I first encountered some CMake detection issues:
After iteratively figuring out (also re-encountering #3069 again along the way) that I needed to add (something like)
I then ran into what looks like a regular compilation error: