OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.36k stars 1.5k forks source link

test failures on mips64 in dnrm2_tiny and dnrm2_inf #3761

Closed svillemot closed 2 years ago

svillemot commented 2 years ago

In OpenBLAS 0.3.21, on mips64, the Debian package fails to build because there are test failures in the utests for dnrm2_tiny and dnrm2_inf:

TEST 32/37 dnrm2:dnrm2_tiny [FAIL]
  ERR: test_dnrm2.c:65  expected 0.000e+00, got inf (diff -inf, tol 1.000e-13)
TEST 33/37 dnrm2:dnrm2_inf [FAIL]
  ERR: test_dnrm2.c:52  expected inf, got 0.000e+00 (diff inf, tol 1.000e-13)

(see https://buildd.debian.org/status/fetch.php?pkg=openblas&arch=mips64el&ver=0.3.21%2Bds-1&stamp=1662336716&raw=0 for the full build log).

I noticed that commit cce4b1d9562d60b875982ce9587c68fbac666ec8 by @XiWeiGu was supposed to fix dnrm2_tiny.

Interestingly, if I revert that very commit, then it fixes dnrm2_inf:

TEST 32/37 dnrm2:dnrm2_tiny [FAIL]
  ERR: test_dnrm2.c:65  expected 0.000e+00, got inf (diff -inf, tol 1.000e-13)                                                    
TEST 33/37 dnrm2:dnrm2_inf [OK]

So it seems that the logic of commit cce4b1d9562d60b875982ce9587c68fbac666ec8 is wrong, and that instead of fixing dnrm2_tiny, it broke dnrm2_inf.

cdluminate commented 2 years ago

I can reproduce this issue by mips64el qemu.

martin-frbg commented 2 years ago

Oops, I wonder what happened here. Almost looks as if the sense of the conditional is inverted (but more likely the right code inserted in the wrong location)

martin-frbg commented 2 years ago

Changing the bc1t to bc1f does appear to "fix" this, but I have not yet given a single thought to the logic, nor checked what it does to the LAPACK testsuite

XiWeiGu commented 2 years ago

Compiling the latest code on the 3A4000 with Loongnix GNU/Linux 20 RC3 presents the following problems:

/usr/include/mips64el-linux-gnuabi64/gnu/stubs.h:41:11: 致命错误:gnu/stubs-n64_hard_2008.h:没有那个文件或目录

nclude <gnu/stubs-n64_hard_2008.h>

       ^~~~~~~~~~~~~~~~~~~~~~~~~~~

In file included from /usr/include/features.h:448, from /usr/include/mips64el-linux-gnuabi64/bits/libc-header-start.h:33, from /usr/include/stdio.h:27, from axpy.c:39: /usr/include/mips64el-linux-gnuabi64/gnu/stubs.h:41:11: 致命错误:gnu/stubs-n64_hard_2008.h:没有那个文件或目录

include <gnu/stubs-n64_hard_2008.h>

       ^~~~~~~~~~~~~~~~~~~~~~~~~~~

编译中断。 编译中断。 In file included from /usr/include/features.h:448, from /usr/include/mips64el-linux-gnuabi64/bits/libc-header-start.h:33, from /usr/include/stdio.h:27, from copy.c:39: /usr/include/mips64el-linux-gnuabi64/gnu/stubs.h:41:11: 致命错误:gnu/stubs-n64_hard_2008.h:没有那个文件或目录

include <gnu/stubs-n64_hard_2008.h>

       ^~~~~~~~~~~~~~~~~~~~~~~~~~~

编译中断。 make[1]: [Makefile:822:saxpy.o] 错误 1 make[1]: 正在等待未完成的任务.... make[1]: [Makefile:849:sscal.o] 错误 1 make[1]: [Makefile:876:scopy.o] 错误 1 In file included from /usr/include/features.h:448, from /usr/include/mips64el-linux-gnuabi64/bits/libc-header-start.h:33, from /usr/include/stdio.h:27, from swap.c:39: /usr/include/mips64el-linux-gnuabi64/gnu/stubs.h:41:11: 致命错误:gnu/stubs-n64_hard_2008.h:没有那个文件或目录

include <gnu/stubs-n64_hard_2008.h>

       ^~~~~~~~~~~~~~~~~~~~~~~~~~~

编译中断。 make[1]: *** [Makefile:894:sswap.o] 错误 1

So I reset to cc4b1d temporarily, all the utest passed:

TEST 1/35 max:smax_zero [OK] TEST 2/35 max:dmax_positive [OK] TEST 3/35 max:smax_negative [OK] TEST 4/35 min:smin_zero [OK] TEST 5/35 min:dmin_positive [OK] TEST 6/35 min:smin_negative [OK] TEST 7/35 amax:damax [OK] TEST 8/35 amax:samax [OK] TEST 9/35 ismax:negative_step_2 [OK] TEST 10/35 ismax:positive_step_2 [OK] TEST 11/35 ismin:negative_step_2 [OK] TEST 12/35 ismin:positive_step_2 [OK] TEST 13/35 drotmg:drotmg_D1_big_D2_big_flag_zero [OK] TEST 14/35 drotmg:rotmg_D1eqD2_X1eqX2 [OK] TEST 15/35 drotmg:rotmg_issue1452 [OK] TEST 16/35 drotmg:rotmg [OK] TEST 17/35 axpy:caxpy_inc_0 [OK] TEST 18/35 axpy:saxpy_inc_0 [OK] TEST 19/35 axpy:zaxpy_inc_0 [OK] TEST 20/35 axpy:daxpy_inc_0 [OK] TEST 21/35 zdotu:zdotu_offset_1 [OK] TEST 22/35 zdotu:zdotu_n_1 [OK] TEST 23/35 dsdot:dsdot_n_1 [OK] TEST 24/35 swap:cswap_inc_0 [OK] TEST 25/35 swap:sswap_inc_0 [OK] TEST 26/35 swap:zswap_inc_0 [OK] TEST 27/35 swap:dswap_inc_0 [OK] TEST 28/35 rot:csrot_inc_0 [OK] TEST 29/35 rot:srot_inc_0 [OK] TEST 30/35 rot:zdrot_inc_0 [OK] TEST 31/35 rot:drot_inc_0 [OK] TEST 32/35 dnrm2:dnrm2_tiny [OK] TEST 33/35 dnrm2:dnrm2_inf [OK] TEST 34/35 fork:safety [OK] TEST 35/35 fork:safety_after_fork_in_parent [OK] RESULTS: 35 tests (35 ok, 0 failed, 0 skipped) ran in 530 ms

svillemot commented 2 years ago

Here, reverting cce4b1d9562d60b875982ce9587c68fbac666ec8 fixes dnrm2_inf but not dnrm2_tiny, as explained above.

XiWeiGu commented 2 years ago

Sorry It's my fault. cce4b1d only fixes dnrm2_tiny on Loongson's mips64el machine, mips64el machines from other manufacturers are still failed. Strangely, cce4b1d doesn't cause extra dnrm2_inf failed when I using mips64el qemu.

TEST 28/35 rot:csrot_inc_0 [OK] TEST 29/35 rot:srot_inc_0 [OK] TEST 30/35 rot:zdrot_inc_0 [OK] TEST 31/35 rot:drot_inc_0 [OK] TEST 32/35 dnrm2:dnrm2_tiny [FAIL] ERR: test_dnrm2.c:65 expected 0.000e+00, got inf (diff -inf, tol 1.000e-13) TEST 33/35 dnrm2:dnrm2_inf [OK] TEST 34/35 fork:safety [OK] TEST 35/35 fork:safety_after_fork_in_parent [OK] RESULTS: 35 tests (34 ok, 1 failed, 0 skipped) ran in 39930 ms

svillemot commented 2 years ago

Thanks @XiWeiGu, I confirm that your latest commit fixes the issue for me.

bonrybon commented 2 years ago

Im still encountering the error:

TEST 32/36 dnrm2:dnrm2_inf [OK] TEST 33/36 dnrm2:dnrm2_tiny [FAIL] ERR: test_dnrm2.c:65 expected 0.000e+00, got inf (diff -inf, tol 1.000e-13) TEST 34/36 potrf:bug_695 [OK] TEST 35/36 potrf:smoketest_trivial [OK] TEST 36/36 kernel_regress:skx_avx [OK] RESULTS: 36 tests (35 ok, 1 failed, 0 skipped) ran in 8 ms make[1]: [run_test] Error 1 make: [tests] Error 2

I notice that even I changed MTC1 to MTC as suggested in https://github.com/xianyi/OpenBLAS/pull/3763/commits/365936ae1b1dfa2f50b3e65c68ae95babc6f2af2 whenever I run extras/install_openblas.sh, the TEST 33/36 still fails and the MTC reverts to MTC1.

martin-frbg commented 2 years ago

where does your extras/install_openblas.sh come from? From your description it looks as if that overwrites everything with a fresh download of an older, unfixed version

bonrybon commented 2 years ago

where does your extras/install_openblas.sh come from? From your description it looks as if that overwrites everything with a fresh download of an older, unfixed version

I got it here: https://github.com/kaldi-asr/kaldi I was performing the make after I had git clone it.

martin-frbg commented 2 years ago

Well, unless you changed the version variable at the top of the script it will download a release version from almost two years ago. No wonder that it brings this problem (and doubtlessly several others) back

bonrybon commented 2 years ago

Actually I updated the extras/install_openblas.sh

from this:

!/usr/bin/env bash

OPENBLAS_VERSION=0.3.13

WGET=${WGET:-wget}

set -e

if ! command -v gfortran 2>/dev/null; then echo "$0: gfortran is not installed. Please install it, e.g. by:" echo " apt-get install gfortran" echo "(if on Debian or Ubuntu), or:" echo " yum install gcc-gfortran" echo "(if on RedHat/CentOS). On a Mac, if brew is installed, it's:" echo " brew install gfortran" exit 1 fi

tarball=OpenBLAS-$OPENBLAS_VERSION.tar.gz

rm -rf xianyi-OpenBLAS- OpenBLAS OpenBLAS-.tar.gz

if [ -d "$DOWNLOAD_DIR" ]; then cp -p "$DOWNLOAD_DIR/$tarball" . else url=$($WGET -qO- "https://api.github.com/repos/xianyi/OpenBLAS/releases/tags/v${OPENBLAS_VERSION}" | python -c 'import sys,json;print(json.load(sys.stdin)["tarball_url"])') test -n "$url" $WGET -t3 -nv -O $tarball "$url" fi

tar xzf $tarball mv xianyi-OpenBLAS-* OpenBLAS

make PREFIX=$(pwd)/OpenBLAS/install USE_LOCKING=1 USE_THREAD=0 -C OpenBLAS all install if [ $? -eq 0 ]; then echo "OpenBLAS is installed successfully." rm $tarball fi

to this:

!/usr/bin/env bash

OPENBLAS_VERSION=0.3.21

WGET=${WGET:-wget}

set -e

if ! command -v gfortran 2>/dev/null; then echo "$0: gfortran is not installed. Please install it, e.g. by:" echo " apt-get install gfortran" echo "(if on Debian or Ubuntu), or:" echo " yum install gcc-gfortran" echo "(if on RedHat/CentOS). On a Mac, if brew is installed, it's:" echo " brew install gfortran" exit 1 fi

tarball=OpenBLAS-$OPENBLAS_VERSION.tar.gz

rm -rf xianyi-OpenBLAS- OpenBLAS OpenBLAS-.tar.gz

if [ -d "$DOWNLOAD_DIR" ]; then cp -p "$DOWNLOAD_DIR/$tarball" . else url=$($WGET -qO- "https://api.github.com/repos/xianyi/OpenBLAS/releases/tags/v${OPENBLAS_VERSION}" | python3 -c 'import sys,json;print(json.load(sys.stdin)["tarball_url"])') test -n "$url" $WGET -t3 -nv -O $tarball "$url" fi

tar xzf $tarball mv xianyi-OpenBLAS-* OpenBLAS

make PREFIX=$(pwd)/OpenBLAS/install USE_LOCKING=1 USE_THREAD=0 -C OpenBLAS all install if [ $? -eq 0 ]; then echo "OpenBLAS is installed successfully." rm $tarball fi

by the way Im using MacOS 12.5.1

martin-frbg commented 2 years ago

Ok, so at least you are downloading/unpacking 0.3.21 each time before you build - however the fix was only added after 0.3.21 was released. Suggest you either use git clone to fetch the current develop branch of OpenBLAS instead of the last release, or you just run the make... line from the script instead of the full script after manually patching the MSR/MSR1 line

bonrybon commented 2 years ago

Im still encountering the error :(

I already performed git clone https://github.com/xianyi/OpenBLAS.git to get the "latest files" then what I did next is performed the command make PREFIX=$(pwd)/OpenBLAS/install USE_LOCKING=1 USE_THREAD=0 -C OpenBLAS all install line from extras/install_openblas.sh

Results: TEST 32/36 dnrm2:dnrm2_inf [OK] TEST 33/36 dnrm2:dnrm2_tiny [FAIL] ERR: test_dnrm2.c:65 expected 0.000e+00, got inf (diff -inf, tol 1.000e-13) TEST 34/36 potrf:bug_695 [OK] TEST 35/36 potrf:smoketest_trivial [OK] TEST 36/36 kernel_regress:skx_avx [OK] RESULTS: 36 tests (35 ok, 1 failed, 0 skipped) ran in 2 ms make[1]: [run_test] Error 1 make: [tests] Error 2

bonrybon commented 2 years ago

It is now resolved what I did: git clone https://github.com/xianyi/OpenBLAS.git cd OpenBLAS make make PREFIX=install install