OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.18k stars 1.47k forks source link

gcc14 compatibility on Loongarch64 #4687

Closed azuresky01 closed 2 months ago

azuresky01 commented 2 months ago

Hello,

I tried to compile current OpenBLAS code (downloaded through git) using gcc 14.1 and get an "unrecognized argument in option ‘-mabi=lp64’" error.

Operating system: AOSC Linux loongarch64 (can also be observed on other systems such as loongnix, Arch, etc.) Host: Loongson-3A6000-HV-7A2000-1w-V0.1-EVB

The error message is the following:

gcc-14.1.0 -O2 -DMAX_STACK_ALLOC=2048 -Wall -mabi=lp64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=8 -DMAX_PARALLEL_NUMBER=1 -DBUILD_SINGLE=1 -DBUILD_DOUBLE=1 -DBUILD_COMPLEX=1 -DBUILD_COMPLEX16=1 -DVERSION=\"0.3.27.dev\" -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHARCNAME -DASMNAME=sdsdot -DASMFNAME=sdsdot -DNAME=sdsdot_ -DCNAME=sdsdot -DCHARNAME=\"sdsdot\" -DCHAR_CNAME=\"sdsdot\" -DNO_AFFINITY -I.. -I. -UDOUBLE -UCOMPLEX -c sdsdot.c -o sdsdot.o gcc-14.1.0: 错误:unrecognized argument in option ‘-mabi=lp64’ gcc-14.1.0: 附注:valid arguments to ‘-mabi=’ are: lp64d lp64f lp64s; did you mean ‘lp64d’?

... ...

"-mabi=" should be "lp64d" but the program has chosen wrong arguments "lp64".

The solution:

I checked the place where "-mabi=" is defined in the file "Makefile.system" at the line 964:

https://github.com/OpenMathLib/OpenBLAS/blob/5d678f18318e99111c40d4f25efae4764143cfb0/Makefile.system#L964

Based on the information at this line I tested "gcc-14.1.0 -c cpuid_loongarch64.c" and "clang -c cpuid_loongarch64.c". In both cases I get error messages and useful suggestions. See below:

"gcc-14.1.0 -c cpuid_loongarch64.c":

cpuid_loongarch64.c: 在函数‘get_architecture’中: cpuid_loongarch64.c:84:3: 错误:隐式声明函数‘printf’ [-Wimplicit-function-declaration] 84 | printf("LOONGARCH64"); | ^~ cpuid_loongarch64.c:36:1: 附注:include ‘’ or provide a declaration of ‘printf’ 35 | #include <sys/auxv.h> +++ |+#include 36 |

"clang -c cpuid_loongarch64.c":

cpuid_loongarch64.c:84:3: error: call to undeclared library function 'printf' with type 'int (const char *, ...)'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] 84 | printf("LOONGARCH64"); | ^ cpuid_loongarch64.c:84:3: note: include the header or explicitly provide a declaration for 'printf' 1 error generated.

Following the suggestion above I added one more line "#include " below line 35 " #include <sys/auxv.h>" in the file cpuid_loongarch64.c. The building error here is gone and the program can correctly recognize "-mabi=lp64d".

By the way, there is still a utest failing error in the end of the compilation:

TEST 63/67 potrf:smoketest_trivial [FAIL] ERR: test_potrs.c:535 L s(0,0) difference: 1.19209e-07

It looks very strange to me. In the file test_potrs.c:534-535 the bound is 1e-5. Why the test failed when err=1.19209e-07?

https://github.com/OpenMathLib/OpenBLAS/blob/5d678f18318e99111c40d4f25efae4764143cfb0/utest/test_potrs.c#L534-L535

XiWeiGu commented 2 months ago

Thank you. I'll fix the GCC 14 compatibility issue. As for the other problem, I'll find a 3A6000 computer to verify it.

XiWeiGu commented 2 months ago

Test environment:

OpenBLAS: 8da6f7e5f29a6ed8ae0513cc0a8d62425803aaa5 && make USE_SIMPLE_THREADED_LEVEL3=1 NO_AFFINITY=0 OS: Loongnix GNU/Linux 20 (DaoXiangHu) loongarch64 Host: Host: Loongson-3A6000-HV-7A2000-XA61200 Kernel: 4.19.0-19-loongson-3 Gcc: gcc version 8.3.0 (Loongnix 8.3.0-6.lnd.vec.37)

Test Result:

TEST 62/67 axpby:saxpby_inc_0 [OK] TEST 63/67 potrf:smoketest_trivial [OK] TEST 64/67 potrf:bug_695 [OK] TEST 65/67 kernel_regress:skx_avx [OK] TEST 66/67 fork:safety [OK] TEST 67/67 fork:safety_after_fork_in_parent [OK] RESULTS: 67 tests (67 ok, 0 failed, 0 skipped) ran in 216 ms

I didn't reproduce this issue. Perhaps I should upgrade to GCC 14.

XiWeiGu commented 2 months ago

It looks very strange to me. In the file test_potrs.c:534-535 the bound is 1e-5. Why the test failed when err=1.19209e-07?

It is indeed quite strange. I wonder if it's related to the compiler.

azuresky01 commented 2 months ago

I wonder if it's related to the compiler.

You are probably right. I tested on AOSC OS (loongarch64). The system default gcc 13.2.0 is fine with no errors:

TEST 60/67 axpby:saxpby_inc_2 [OK] TEST 61/67 axpby:saxpby_inc_1 [OK] TEST 62/67 axpby:saxpby_inc_0 [OK] TEST 63/67 potrf:smoketest_trivial [OK] TEST 64/67 potrf:bug_695 [OK] TEST 65/67 kernel_regress:skx_avx [OK] TEST 66/67 fork:safety [OK] TEST 67/67 fork:safety_after_fork_in_parent [OK] RESULTS: 67 tests (67 ok, 0 failed, 0 skipped) ran in 282 ms

However AOSC backports many changes not in upstream GCC 13.2 to their "13.2": https://github.com/AOSC-Dev/aosc-os-abbs/tree/stable/core-devel/gcc/01-runtime/patches

Using gcc 13.2 compiled from official source I got the same utest failing error:

TEST 60/67 axpby:saxpby_inc_2 [OK] TEST 61/67 axpby:saxpby_inc_1 [OK] TEST 62/67 axpby:saxpby_inc_0 [OK] TEST 63/67 potrf:smoketest_trivial [FAIL] ERR: test_potrs.c:535 L s(0,0) difference: 1.19209e-07 TEST 64/67 potrf:bug_695 [OK] TEST 65/67 kernel_regress:skx_avx [OK] TEST 66/67 fork:safety [OK] TEST 67/67 fork:safety_after_fork_in_parent [OK] RESULTS: 67 tests (66 ok, 1 failed, 0 skipped) ran in 257 ms make[1]: *** [Makefile:82:run_test] 错误 1

azuresky01 commented 2 months ago

On Loong Arch Linux it fails too. System default gcc is 14.0.1 20240316 (experimental):

使用内建 specs。 COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/loongarch64-unknown-linux-gnu/14.0.1/lto-wrapper 目标:loongarch64-unknown-linux-gnu 配置为:/build/gcc/src/gcc/configure --enable-languages=c,c++,fortran,lto,m2,objc,obj-c++ --enable-bootstrap --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --with-build-config=bootstrap-lto --with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-linker-build-id --enable-lto --disable-multiarch --disable-multilib --enable-plugin --enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch --disable-werror 线程模型:posix 支持的 LTO 压缩算法:zlib zstd gcc 版本 14.0.1 20240316 (experimental) (GCC)

utest results:

TEST 60/67 axpby:saxpby_inc_2 [OK] TEST 61/67 axpby:saxpby_inc_1 [OK] TEST 62/67 axpby:saxpby_inc_0 [OK] TEST 63/67 potrf:smoketest_trivial [FAIL] ERR: test_potrs.c:535 L s(0,0) difference: 1.19209e-07 TEST 64/67 potrf:bug_695 [OK] TEST 65/67 kernel_regress:skx_avx [OK] TEST 66/67 fork:safety [OK] TEST 67/67 fork:safety_after_fork_in_parent [OK] RESULTS: 67 tests (66 ok, 1 failed, 0 skipped) ran in 218 ms make[1]: *** [Makefile:82:run_test] 错误 1

azuresky01 commented 2 months ago

After " pacman -Syu " in Arch Linux to update the software I got an updated version of gcc 14.0.1:

使用内建 specs。 COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/loongarch64-unknown-linux-gnu/14.0.1/lto-wrapper 目标:loongarch64-unknown-linux-gnu 配置为:/build/gcc/src/gcc/configure --enable-languages=c,c++,fortran,lto,m2,objc,obj-c++ --enable-bootstrap --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --with-build-config=bootstrap-lto --with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-linker-build-id --enable-lto --disable-multiarch --disable-multilib --enable-plugin --enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch --disable-werror 线程模型:posix 支持的 LTO 压缩算法:zlib zstd gcc 版本 14.0.1 20240421 (experimental) (GCC)

compile OpenBLAS again I got the same potrf:smoketest_trivial [FAIL] but happened at different place:

TEST 60/67 axpby:saxpby_inc_2 [OK] TEST 61/67 axpby:saxpby_inc_1 [OK] TEST 62/67 axpby:saxpby_inc_0 [OK] TEST 63/67 potrf:smoketest_trivial [FAIL] ERR: test_potrs.c:541 L d(0,0) difference: 4.44089e-16 TEST 64/67 potrf:bug_695 [OK] TEST 65/67 kernel_regress:skx_avx [OK] TEST 66/67 fork:safety [OK] TEST 67/67 fork:safety_after_fork_in_parent [OK] RESULTS: 67 tests (66 ok, 1 failed, 0 skipped) ran in 227 ms make[1]: *** [Makefile:82:run_test] 错误 1

This is still strange since the bound here is 1e-12 but err=4.44089e-16

https://github.com/OpenMathLib/OpenBLAS/blob/8da6f7e5f29a6ed8ae0513cc0a8d62425803aaa5/utest/test_potrs.c#L540-L541

martin-frbg commented 2 months ago

I don't think we can do anything about a broken compiler, unless there is a known compiler flag that happens to work around the problem. If gcc14 gets something as fundamental as floating point comparisons in plain C code wrong, the library built by it will probably be unusable even if we manage to "fix" a single test.