Open h-vetinari opened 1 week ago
right, flang on Windows appears to be lagging behind the Linux/Unix version. If there is no omp_lib module provided by LLVM, I suggest you open an issue with them (unless already known/documented). This may also be the reason why you needed to set a bunch of cmake variables manually
but ISTR broken OpenMP support in LLVM on Windows is a known problem, and your "flang 19" is an unstable snapshot
but ISTR broken OpenMP support in LLVM on Windows is a known problem
Do you have a link?
and your "flang 19" is an unstable snapshot
Yes, 19.1.0rc1 is only expected in about a month. The problem was that all flang 18 builds were broken for SciPy, and I wanted to test/ensure that things work with flang 19 (early enough for potentially necessary fixes to land), hence why I built from main, as noted in the OP.
somewhere in the discussion in #3973 I think, but it could be that it was broken once, worked for a while and is now broken again.
It seems upstream intends to support it. I'm trying to rebuild as necessary to test that hypothesis.
Meanwhile, I've had one passing run without OpenMP, however, the logs got spammed so badly with warnings (~500MB), that I cannot really check them. After running again with warnings ignored, I get:
97% tests passed, 4 tests failed out of 120
Errors while running CTest
Output from these tests are in: D:/bld/openblas_1719572917500/work/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
Total Test time (real) = 9104.76 sec
The following tests FAILED:
5 - sblas3 (Failed)
8 - dblas3 (Failed)
11 - cblas3 (Failed)
15 - zblas3 (Failed)
The runtime especially of complex-valued procedures is off the charts. That's also the part that depends on LLVM's compiler-rt (instead of MSVC's runtime); not sure if that plays a role somehow. An excerpt:
14/120 Test #15: zblas3 ......................................................................................***Failed 5.13 sec
Start 16: zblas3_3m
15/120 Test #16: zblas3_3m ................................................................................... Passed 0.75 sec
Start 17: REAL_LAPACK_linear_equation_routines
16/120 Test #14: zblas2 ...................................................................................... Passed 202.79 sec
Start 18: COMPLEX_LAPACK_linear_equation_routines
17/120 Test #17: REAL_LAPACK_linear_equation_routines ........................................................ Passed 2937.14 sec
Start 19: DOUBLE_PRECISION_LAPACK_linear_equation_routines
18/120 Test #18: COMPLEX_LAPACK_linear_equation_routines ..................................................... Passed 4390.90 sec
Start 20: COMPLEX16_LAPACK_linear_equation_routines
19/120 Test #19: DOUBLE_PRECISION_LAPACK_linear_equation_routines ............................................ Passed 3293.06 sec
Start 21: SINGLE-DOUBLE_PRECISION_LAPACK_prototype_linear_equation_routines
20/120 Test #21: SINGLE-DOUBLE_PRECISION_LAPACK_prototype_linear_equation_routines ........................... Passed 178.72 sec
Start 22: Testing_COMPLEX-COMPLEX16_LAPACK_prototype_linear_equation_routines
21/120 Test #22: Testing_COMPLEX-COMPLEX16_LAPACK_prototype_linear_equation_routines ......................... Passed 793.80 sec
Start 23: Testing_REAL_LAPACK_RFP_prototype_linear_equation_routines
22/120 Test #23: Testing_REAL_LAPACK_RFP_prototype_linear_equation_routines .................................. Passed 57.25 sec
Start 24: Testing_DOUBLE_PRECISION_LAPACK_RFP_prototype_linear_equation_routines
23/120 Test #24: Testing_DOUBLE_PRECISION_LAPACK_RFP_prototype_linear_equation_routines ...................... Passed 115.00 sec
Start 25: Testing_COMPLEX_LAPACK_RFP_prototype_linear_equation_routines
24/120 Test #25: Testing_COMPLEX_LAPACK_RFP_prototype_linear_equation_routines ............................... Passed 338.11 sec
Start 26: Testing_COMPLEX16_LAPACK_RFP_prototype_linear_equation_routines
25/120 Test #26: Testing_COMPLEX16_LAPACK_RFP_prototype_linear_equation_routines ............................. Passed 283.97 sec
Start 27: SNEP:_Testing_Nonsymmetric_Eigenvalue_Problem_routines
26/120 Test #27: SNEP:_Testing_Nonsymmetric_Eigenvalue_Problem_routines ...................................... Passed 1.00 sec
Start 28: SSEP:_Testing_Symmetric_Eigenvalue_Problem_routines
27/120 Test #28: SSEP:_Testing_Symmetric_Eigenvalue_Problem_routines ......................................... Passed 0.70 sec
Start 29: SSE2:_Testing_Symmetric_Eigenvalue_Problem_routines
28/120 Test #29: SSE2:_Testing_Symmetric_Eigenvalue_Problem_routines ......................................... Passed 1.89 sec
Start 30: SSVD:_Testing_Singular_Value_Decomposition_routines
29/120 Test #30: SSVD:_Testing_Singular_Value_Decomposition_routines ......................................... Passed 0.71 sec
Start 31: SSEC:_Testing_REAL_Eigen_Condition_Routines
30/120 Test #31: SSEC:_Testing_REAL_Eigen_Condition_Routines ................................................. Passed 0.60 sec
Start 32: SSEV:_Testing_REAL_Nonsymmetric_Eigenvalue_Driver
FWIW, the time to run ctest -j2
on our current builds is:
100% tests passed, 0 tests failed out of 120
Total Test time (real) = 6305.40 sec
So flang takes about 50% more time.
on a rerun of the flang-built openblas (without openmp), I now get:
100% tests passed, 0 tests failed out of 120
Total Test time (real) = 8278.32 sec
My suspicion is that the CI agents switching between different CPU types randomly is the cause for the pass or fail. If necessary, I can try to validate that hypothesis.
maybe alternating between avx512 and non-avx512 , common problem with azure-ci and ISTR mmuetzel had noted problems with win-llvm and avx512 previously
omp mod problem could also be missing include path to modules in llvm install path. will try to look at your logs later if able
It fails with the following instructions found for the CPU of the CI agent (according to numpy):
"found": [
"SSSE3",
"SSE41",
"POPCNT",
"SSE42",
"AVX",
"F16C",
"FMA3",
"AVX2",
"AVX512F",
"AVX512CD",
"AVX512_SKX"
],
"not found": [
"AVX512_CLX",
"AVX512_CNL",
"AVX512_ICL"
]
I managed to fix the omp_lib.mod
problem, the openmp-enabled build now fails with:
[17746/19184] Linking Fortran shared library lib\openblas.dll
FAILED: lib/openblas.dll lib/Release/openblas.lib
C:\Windows\system32\cmd.exe /C "C:\Windows\system32\cmd.exe /C "%BUILD_PREFIX%\Library\bin\cmake.exe -E __create_def %SRC_DIR%\build\CMakeFiles\openblas_shared.dir\.\exports.def %SRC_DIR%\build\CMakeFiles\openblas_shared.dir\.\exports.def.objs --nm=%BUILD_PREFIX%\Library\bin\llvm-nm.exe && cd %SRC_DIR%\build" && %BUILD_PREFIX%\Library\bin\cmake.exe -E vs_link_dll --intdir=CMakeFiles\openblas_shared.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\mt.exe --manifests -- C:\PROGRA~1\LLVM\bin\lld-link.exe /nologo @CMakeFiles\openblas_shared.rsp /out:lib\openblas.dll /implib:lib\Release\openblas.lib /pdb:lib\openblas.pdb /dll /version:0.3 /machine:x64 /INCREMENTAL:NO /DEBUG /OPT:REF /OPT:ICF /DEF:CMakeFiles\openblas_shared.dir\.\exports.def -libpath:"D:/bld/openblas_1719644456116/_build_env/Library/lib" -libpath:"D:/bld/openblas_1719644456116/_build_env/Library/lib/clang/19/lib/windows" && cd ."
LINK: command "C:\PROGRA~1\LLVM\bin\lld-link.exe /nologo @CMakeFiles\openblas_shared.rsp /out:lib\openblas.dll /implib:lib\Release\openblas.lib /pdb:lib\openblas.pdb /dll /version:0.3 /machine:x64 /INCREMENTAL:NO /DEBUG /OPT:REF /OPT:ICF /DEF:CMakeFiles\openblas_shared.dir\.\exports.def -libpath:D:/bld/openblas_1719644456116/_build_env/Library/lib -libpath:D:/bld/openblas_1719644456116/_build_env/Library/lib/clang/19/lib/windows /MANIFEST:EMBED,ID=2" failed (exit code 1) with the following output:
lld-link: warning: ignoring unknown argument '-lpthreads'
lld-link: error: undefined symbol: omp_get_max_threads
The lpthreads argument error is probably bogus. Do you see -fopenmp
or -lomp
in the build log ? Do you see omp_get_max_threads in nm
or dumpbin
output of the llvm-provided libomp ?
Passing configuration (non-openmp):
"found": [
"SSSE3",
"SSE41",
"POPCNT",
"SSE42",
"AVX",
"F16C",
"FMA3",
"AVX2"
],
"not found": [
"AVX512F",
"AVX512CD",
"AVX512_SKX",
"AVX512_CLX",
"AVX512_CNL",
"AVX512_ICL"
]
So indeed some AVX512 issue seems likely
Do you see
-fopenmp
or-lomp
in the build log ?
Yeah, I'm even passing that explicitly in -DOpenMP_Fortran_FLAGS=-fopenmp
. To be honest, I have no idea where the -lpthreads
is coming from.
seen some google hits suggesting that it is simply fallout from one of cmake's standard configuration check scripts and can be ignored. but omp_get_max_threads should definitely be in llvm's libomp (or at least used to be in earlier versions)
Looking closer, I'm almost certain this is a question of the wrong library path being picked:
-libpath:"D:/bld/openblas_1719644456116/_build_env/Library/lib" -libpath:"D:/bld/openblas_1719644456116/_build_env/Library/lib/clang/19/lib/windows"
Both of these are pointing to the build environment, not the host where openmp is actually present. Perhaps the path is constructed relative to clang/flang?
OpenBLAS already added flang support, but I don't think this is being tested on windows? While reviving the old effort to build conda-forge's
openblas
with flang, I originally ran into some parsing issue with flang 18.Luckily, with a flang 19 built from main (already built for debugging something else, so I thought I'd try), it seems that particular issue is gone. 🥳
However, I first encountered some CMake detection issues:
After iteratively figuring out (also re-encountering #3069 again along the way) that I needed to add (something like)
I then ran into what looks like a regular compilation error: