Closed azuresky01 closed 2 months ago
Thank you. I'll fix the GCC 14 compatibility issue. As for the other problem, I'll find a 3A6000 computer to verify it.
Test environment:
OpenBLAS:
8da6f7e5f29a6ed8ae0513cc0a8d62425803aaa5
&&make USE_SIMPLE_THREADED_LEVEL3=1 NO_AFFINITY=0
OS: Loongnix GNU/Linux 20 (DaoXiangHu) loongarch64 Host: Host: Loongson-3A6000-HV-7A2000-XA61200 Kernel: 4.19.0-19-loongson-3 Gcc: gcc version 8.3.0 (Loongnix 8.3.0-6.lnd.vec.37)
Test Result:
TEST 62/67 axpby:saxpby_inc_0 [OK] TEST 63/67 potrf:smoketest_trivial [OK] TEST 64/67 potrf:bug_695 [OK] TEST 65/67 kernel_regress:skx_avx [OK] TEST 66/67 fork:safety [OK] TEST 67/67 fork:safety_after_fork_in_parent [OK] RESULTS: 67 tests (67 ok, 0 failed, 0 skipped) ran in 216 ms
I didn't reproduce this issue. Perhaps I should upgrade to GCC 14.
It looks very strange to me. In the file test_potrs.c:534-535 the bound is 1e-5. Why the test failed when err=1.19209e-07?
It is indeed quite strange. I wonder if it's related to the compiler.
I wonder if it's related to the compiler.
You are probably right. I tested on AOSC OS (loongarch64). The system default gcc 13.2.0 is fine with no errors:
TEST 60/67 axpby:saxpby_inc_2 [OK] TEST 61/67 axpby:saxpby_inc_1 [OK] TEST 62/67 axpby:saxpby_inc_0 [OK] TEST 63/67 potrf:smoketest_trivial [OK] TEST 64/67 potrf:bug_695 [OK] TEST 65/67 kernel_regress:skx_avx [OK] TEST 66/67 fork:safety [OK] TEST 67/67 fork:safety_after_fork_in_parent [OK] RESULTS: 67 tests (67 ok, 0 failed, 0 skipped) ran in 282 ms
However AOSC backports many changes not in upstream GCC 13.2 to their "13.2": https://github.com/AOSC-Dev/aosc-os-abbs/tree/stable/core-devel/gcc/01-runtime/patches
Using gcc 13.2 compiled from official source I got the same utest failing error:
TEST 60/67 axpby:saxpby_inc_2 [OK] TEST 61/67 axpby:saxpby_inc_1 [OK] TEST 62/67 axpby:saxpby_inc_0 [OK] TEST 63/67 potrf:smoketest_trivial [FAIL] ERR: test_potrs.c:535 L s(0,0) difference: 1.19209e-07 TEST 64/67 potrf:bug_695 [OK] TEST 65/67 kernel_regress:skx_avx [OK] TEST 66/67 fork:safety [OK] TEST 67/67 fork:safety_after_fork_in_parent [OK] RESULTS: 67 tests (66 ok, 1 failed, 0 skipped) ran in 257 ms make[1]: *** [Makefile:82:run_test] 错误 1
On Loong Arch Linux it fails too. System default gcc is 14.0.1 20240316 (experimental):
使用内建 specs。 COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/loongarch64-unknown-linux-gnu/14.0.1/lto-wrapper 目标:loongarch64-unknown-linux-gnu 配置为:/build/gcc/src/gcc/configure --enable-languages=c,c++,fortran,lto,m2,objc,obj-c++ --enable-bootstrap --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --with-build-config=bootstrap-lto --with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-linker-build-id --enable-lto --disable-multiarch --disable-multilib --enable-plugin --enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch --disable-werror 线程模型:posix 支持的 LTO 压缩算法:zlib zstd gcc 版本 14.0.1 20240316 (experimental) (GCC)
utest results:
TEST 60/67 axpby:saxpby_inc_2 [OK] TEST 61/67 axpby:saxpby_inc_1 [OK] TEST 62/67 axpby:saxpby_inc_0 [OK] TEST 63/67 potrf:smoketest_trivial [FAIL] ERR: test_potrs.c:535 L s(0,0) difference: 1.19209e-07 TEST 64/67 potrf:bug_695 [OK] TEST 65/67 kernel_regress:skx_avx [OK] TEST 66/67 fork:safety [OK] TEST 67/67 fork:safety_after_fork_in_parent [OK] RESULTS: 67 tests (66 ok, 1 failed, 0 skipped) ran in 218 ms make[1]: *** [Makefile:82:run_test] 错误 1
After " pacman -Syu " in Arch Linux to update the software I got an updated version of gcc 14.0.1:
使用内建 specs。 COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/loongarch64-unknown-linux-gnu/14.0.1/lto-wrapper 目标:loongarch64-unknown-linux-gnu 配置为:/build/gcc/src/gcc/configure --enable-languages=c,c++,fortran,lto,m2,objc,obj-c++ --enable-bootstrap --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --with-build-config=bootstrap-lto --with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-linker-build-id --enable-lto --disable-multiarch --disable-multilib --enable-plugin --enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch --disable-werror 线程模型:posix 支持的 LTO 压缩算法:zlib zstd gcc 版本 14.0.1 20240421 (experimental) (GCC)
compile OpenBLAS again I got the same potrf:smoketest_trivial [FAIL] but happened at different place:
TEST 60/67 axpby:saxpby_inc_2 [OK] TEST 61/67 axpby:saxpby_inc_1 [OK] TEST 62/67 axpby:saxpby_inc_0 [OK] TEST 63/67 potrf:smoketest_trivial [FAIL] ERR: test_potrs.c:541 L d(0,0) difference: 4.44089e-16 TEST 64/67 potrf:bug_695 [OK] TEST 65/67 kernel_regress:skx_avx [OK] TEST 66/67 fork:safety [OK] TEST 67/67 fork:safety_after_fork_in_parent [OK] RESULTS: 67 tests (66 ok, 1 failed, 0 skipped) ran in 227 ms make[1]: *** [Makefile:82:run_test] 错误 1
This is still strange since the bound here is 1e-12 but err=4.44089e-16
I don't think we can do anything about a broken compiler, unless there is a known compiler flag that happens to work around the problem. If gcc14 gets something as fundamental as floating point comparisons in plain C code wrong, the library built by it will probably be unusable even if we manage to "fix" a single test.
Hello,
I tried to compile current OpenBLAS code (downloaded through git) using gcc 14.1 and get an "unrecognized argument in option ‘-mabi=lp64’" error.
Operating system: AOSC Linux loongarch64 (can also be observed on other systems such as loongnix, Arch, etc.) Host: Loongson-3A6000-HV-7A2000-1w-V0.1-EVB
The error message is the following:
... ...
"-mabi=" should be "lp64d" but the program has chosen wrong arguments "lp64".
The solution:
I checked the place where "-mabi=" is defined in the file "Makefile.system" at the line 964:
https://github.com/OpenMathLib/OpenBLAS/blob/5d678f18318e99111c40d4f25efae4764143cfb0/Makefile.system#L964
Based on the information at this line I tested "gcc-14.1.0 -c cpuid_loongarch64.c" and "clang -c cpuid_loongarch64.c". In both cases I get error messages and useful suggestions. See below:
"gcc-14.1.0 -c cpuid_loongarch64.c":
"clang -c cpuid_loongarch64.c":
Following the suggestion above I added one more line "#include" below line 35 " #include <sys/auxv.h>" in the file cpuid_loongarch64.c. The building error here is gone and the program can correctly recognize "-mabi=lp64d".
By the way, there is still a utest failing error in the end of the compilation:
It looks very strange to me. In the file test_potrs.c:534-535 the bound is 1e-5. Why the test failed when err=1.19209e-07?
https://github.com/OpenMathLib/OpenBLAS/blob/5d678f18318e99111c40d4f25efae4764143cfb0/utest/test_potrs.c#L534-L535