OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.39k stars 1.5k forks source link

amax:samax utest failure #1912

Closed virtuald closed 4 years ago

virtuald commented 5 years ago

System configuration:

Compile was via:

make TARGET=CORTEXA9 PREFIX=/usr/local

Here's the output:

./openblas_utest 
TEST 1/23 amax:samax [FAIL]
  ERR: test_amax.c:44  expected 3.300e+00, got 4.204e-45 (diff 3.300e+00, tol 1.000e-04)
...
RESULTS: 23 tests (22 ok, 1 failed, 0 skipped) ran in 1860 ms

What steps should I take next to diagnose this issue? Thanks!

virtuald commented 5 years ago

Looking more at the compile log, this pops out at me:

OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat3 < ./sblat3.dat
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./dblat3 < ./dblat3.dat
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./cblat2 < ./cblat2.dat
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./cblat3 < ./cblat3.dat
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./zblat2 < ./zblat2.dat
 TESTS OF THE COMPLEX          LEVEL 3 BLAS

 THE FOLLOWING PARAMETER VALUES WILL BE USED:
   FOR N                   0     1     2     3     7    31
   FOR ALPHA          ( 0.0, 0.0)  ( 1.0, 0.0)  ( 0.7,-0.9)  
   FOR BETA           ( 0.0, 0.0)  ( 1.0, 0.0)  ( 1.3,-1.1)  

 ERROR-EXITS WILL NOT BE TESTED

 ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN   16.00

 RELATIVE MACHINE PRECISION IS TAKEN TO BE  1.2E-07

 CGEMM  PASSED THE COMPUTATIONAL TESTS ( 17496 CALLS)

 CHEMM  PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 CSYMM  PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 CTRMM  PASSED THE COMPUTATIONAL TESTS (  2592 CALLS)

 CTRSM  PASSED THE COMPUTATIONAL TESTS (  2592 CALLS)

 ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
                       EXPECTED RESULT                    COMPUTED RESULT
       1  (  -0.153378    ,    0.00000    )  (  -0.153378    ,    0.00000    )
       2  (   0.714370    ,   0.406021    )  (   0.714370    ,   0.406021    )
       3  (   0.209727    ,   0.325695    )  (   0.969031E-01,   0.298701    )
      THESE ARE THE RESULTS FOR COLUMN   1
 ******* CHERK  FAILED ON CALL NUMBER:
    698: CHERK ('L','N',  3,  1, 1.0, A,  4, 1.0, C,  4)                         .

 CSYRK  PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 CHER2K PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 CSYR2K PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 END OF TESTS
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./zblat3 < ./zblat3.dat
rm -f ?BLAT3.SUMM
OPENBLAS_NUM_THREADS=2 ./sblat3 < ./sblat3.dat
rm -f ?BLAT2.SUMM
OPENBLAS_NUM_THREADS=2 ./sblat2 < ./sblat2.dat
OPENBLAS_NUM_THREADS=2 ./dblat3 < ./dblat3.dat
OPENBLAS_NUM_THREADS=2 ./dblat2 < ./dblat2.dat
OPENBLAS_NUM_THREADS=2 ./cblat3 < ./cblat3.dat
OPENBLAS_NUM_THREADS=2 ./cblat2 < ./cblat2.dat
 TESTS OF THE COMPLEX          LEVEL 3 BLAS

 THE FOLLOWING PARAMETER VALUES WILL BE USED:
   FOR N                   0     1     2     3     7    31
   FOR ALPHA          ( 0.0, 0.0)  ( 1.0, 0.0)  ( 0.7,-0.9)  
   FOR BETA           ( 0.0, 0.0)  ( 1.0, 0.0)  ( 1.3,-1.1)  

 ERROR-EXITS WILL NOT BE TESTED

 ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN   16.00

 RELATIVE MACHINE PRECISION IS TAKEN TO BE  1.2E-07

 CGEMM  PASSED THE COMPUTATIONAL TESTS ( 17496 CALLS)

 CHEMM  PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 CSYMM  PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 CTRMM  PASSED THE COMPUTATIONAL TESTS (  2592 CALLS)

 CTRSM  PASSED THE COMPUTATIONAL TESTS (  2592 CALLS)

 ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
                       EXPECTED RESULT                    COMPUTED RESULT
       1  (  -0.153378    ,    0.00000    )  (  -0.153378    ,    0.00000    )
       2  (   0.714370    ,   0.406021    )  (   0.714370    ,   0.406021    )
       3  (   0.209727    ,   0.325695    )  (   0.969031E-01,   0.298701    )
      THESE ARE THE RESULTS FOR COLUMN   1
 ******* CHERK  FAILED ON CALL NUMBER:
    698: CHERK ('L','N',  3,  1, 1.0, A,  4, 1.0, C,  4)                         .

 CSYRK  PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 CHER2K PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 CSYR2K PASSED THE COMPUTATIONAL TESTS (  1296 CALLS)

 END OF TESTS
OPENBLAS_NUM_THREADS=2 ./zblat3 < ./zblat3.dat
OPENBLAS_NUM_THREADS=2 ./zblat2 < ./zblat2.dat
make[1]: Leaving directory '/home/admin/v0.3.4/test'
martin-frbg commented 5 years ago

Interesting, I did not see these failures on Cortex A17 and the samax microkernel was last touched in #740 (apart from a trivial instruction change a few months ago, but that happened after the 0.2.20 release that you say is also affected). Is CHERK the only test failure with that dreaded old "less than half accurate" message?

martin-frbg commented 5 years ago

A quick workaround for the SAMAX issue would be to append

SAMAXKERNEL = amax.c
DAMAXKERNEL = amax.c
CAMAXKERNEL = zamax.c
ZAMAXKERNEL = zamax.c

in kernel/arm/KERNEL.CORTEXA9. (Not entirely sure if this would address the CHERK problem as well, but probably not. )

brada4 commented 5 years ago

Could you post longer, like full build log? Is arm64 working correctly with your CPU?

virtuald commented 5 years ago

Here's the full logfile: openblas_log.txt

brada4 commented 5 years ago

No incidents during build.

virtuald commented 5 years ago

Modified the kernel file, did a build, still failed. Doing a make clean followed by a build... seems the CHERK tests are still failing:

 ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
                       EXPECTED RESULT                    COMPUTED RESULT
       1  (  -0.153378    ,    0.00000    )  (  -0.153378    ,    0.00000    )
       2  (   0.714370    ,   0.406021    )  (   0.714370    ,   0.406021    )
       3  (   0.209727    ,   0.325695    )  (   0.969031E-01,   0.298701    )
      THESE ARE THE RESULTS FOR COLUMN   1
 ******* CHERK  FAILED ON CALL NUMBER:
    698: CHERK ('L','N',  3,  1, 1.0, A,  4, 1.0, C,  4)   

However, running a new utest seems to have fixed that issue.

By checking arm64, I assume you mean building on the ODROID host outside of the docker image?

brada4 commented 5 years ago

Your CPU is a 64bit A53, and will work faster with 64bit build on a 64bit OS (though it is important here to have last avail 32bit option to work fine too)

martin-frbg commented 5 years ago

CHERK codepath involves syrk/gemm but the individual tests for those appear to pass (as does the ZHERK test). I will try to do a TARGET=CORTEXA9 build on my A17 if and when I have time. (I do wonder if this could be a compiler problem though)

virtuald commented 5 years ago

@brada4 the target ARM system that I'll be running the compiled OpenBLAS on is a A9, so this is vaguely a cross compile. It's just the ODROID is way faster at compiling things, so I'd rather use that to compile. :)

@martin-frbg totally could be a compiler thing seeing as I had to build the fortran compiler package myself via bitbake... any thoughts on how I can detect that?

martin-frbg commented 5 years ago

No idea except perhaps trying another compiler build or version. Unmodified "develop" branch passes all tests on my Cortex A17 (Asus Tinkerboard, gcc 6.3.0) with TARGET=CORTEXA9 (which btw appears to be pretty much identical to the ARMV7 target that gets built by default)

brada4 commented 5 years ago

Can you try TARGET=ARMV5 which is pure C before implicating your gfortran? It is quite unlikely that dozen tests were OK with same gfortran, then one choked.

virtuald commented 5 years ago

Tried TARGET=ARMV5, but the tests segfaulted. ldd says it was linked to libgfortran, so I restarted the docker image and did not install gfortran et al in it. Once I did that, it built without any issues, but there doesn't seem to be any tests?

brada4 commented 5 years ago

Tests are compiled with fortran compiler, so in absence of such they dont compile. You have to run "make clean" between build parameter changes, there should be no difference in interface where fortran links to.

Haffon commented 5 years ago

I have the same issue, and the failed number is just same with virtuald. I'm compiling OpenBLAS 0.3.6 for model name : ARMv7 Processor rev 5 (v7l) BogoMIPS : 72.00 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae

martin-frbg commented 5 years ago

@Haffon are you (cross)compiling this on a 64bit processor in 32bit mode (like virtuald did), or how/where do you compile ?

Haffon commented 5 years ago

@martin-frbg I‘m not cross compiling. I just download the 0.3.6 code into the arm board and type make. I have done the workaround with the SAMAX in kernel/arm/KERNEL.CORTEXA9 and the error still exist. I can ssh to that board and the board have HDMI and usb ports, so it can by used as a standalone like PC. An debian based OS has been installed in this board. uname -a Linux linbian 4.4.55+ #1 SMP PREEMPT Thu Jan 31 19:03:50 PST 2019 armv7l GNU/Linux I even can use sudo apt-get install libsnappy-dev to install packages.

brada4 commented 5 years ago

ARMv7 is 32bit, I do not know what is that linux32. Probably no forced arch or target since whole build is on the host.

martin-frbg commented 5 years ago

Interesting, so same as with my arm board (Asus Tinkerboard) where I did not see these failures.

Haffon commented 5 years ago

since there is no installation instruction for arm-linux, so I just type make and enter. The Android section is tell me to do cross compile without Fortran, they are not my case. I changed the compile command to make TARGET=ARMV7 to make tools think they are cross compiling. Does they work for me, and what make install do if it is in cross compile?

martin-frbg commented 5 years ago

Both just "make" and "make TARGET=ARMV7" should work (without the TARGET argument it would autodetect the cpu, which should result in ARMV7 as well - nothing to do with cross-compile). My Tinkerboard identifies as ARMv7 rev 1 (v7l) with additional cpu features "thumbee" and "evtstrm" that probably play no role here. gcc default target (from gcc -v) is arm-linux-gnueabihf, maybe that is the difference ?

Haffon commented 5 years ago

my "gcc -v" show the "--target=arm-linux-gnueabi", "cat /proc/cpuinfo" has no "thumbee" and "evtstrm" feature. use "make TARGET=ARMV7" doesn't work, I finally using "make NOFORTRAN=1" and the compilation is done, hope it will work in my application since there is no utest.

martin-frbg commented 5 years ago

Alright. So the SAMAX problem seems to be specific to "softfp" builds (while my Tinkerboard is set up for hardfp so I cannot reproduce the bug there - I need to find out if I need to (and can) install another gcc toolchain to change this).
With NOFORTRAN=1 it will not build the BLAS and LAPACK tests as these are written in fortran, not sure why it would not build and run the utests. EDIT: seems to be a small bug in the Makefile logic - you can run "make" in the utest directory after building the library to get the utests.

Haffon commented 5 years ago

Some body told me to use this branch "https://github.com/xianyi/OpenBLAS/tree/arm_soft_fp_abi", but still not work as "gnu/stubs-hard.h: No such file or directory". In fact I'm porting caffe to some kind for development board which has arm core, I can't do "sudo apt-get install libopenblas-dev" described in "https://github.com/OAID/Caffe-HRT/blob/master/acl_openailab/installation.md", so I try to compile it manually. Thanks.

brada4 commented 5 years ago

That is 2 years untouched development branch. Probably you need to add universe or something like that. Packages are alive and well: https://packages.debian.org/stretch/libopenblas-base https://packages.ubuntu.com/disco/armhf/libopenblas-dev

martin-frbg commented 5 years ago

The soft_fp_abi branch is from the first attempts to support softfp two years ago, only a few functions in it were modified back then - but it looks as if iamax_vfp.S and possibly other files are still not modified for softfp in the current version. (From looking at sdot_vfp.S, the change could be as simple as moving the computed result to a different register before returning)

Haffon commented 5 years ago

@brada4 It is good way to download the .deb, but the site list of "https://packages.ubuntu.com/disco/armhf/libopenblas-dev/download" is blank. Debian package is downloadable "https://packages.debian.org/stretch/armhf/libopenblas-base/download" except the version is 0.2.19.

Haffon commented 5 years ago

The debian architecture should be armel, but only armhf downloading url is present for libopenblas-base. After I run "readelf -A /proc/self/exe | grep Tag_ABI_VFP_args" nothing return, and "make TARGET_ARCH=armel" failed with "cc: error: armel: No such file or directory" when making utest.

brada4 commented 5 years ago

armel is fpu-less calling convention TARGET=ARMv5 will use no assembly, thus be indifferent from calling convention. Try that.

martin-frbg commented 5 years ago

Hmm. I downloaded the Linbian sd image in the hope of running it from qemu, but I cannot find a kernel or initrd in it (/boot appears to be empty, but I see kernel modules under lib/modules/4.4.55+) Does this use some nonconventional boot process, or is there something wrong with the "eagle-debian-lindeni-v5-flat" image of 2019-02-01?

Haffon commented 5 years ago

@brada4 using TARGET=ARMV5 make build complete. @martin-frbg I also have empty /boot directory.

brada4 commented 5 years ago

It is SMP at least. Lets try v6 and v7, reporting bugs here, there is still double speed waiting.

Answering about /boot - bootloaders as files are are thing of a PC, it is quite possible that embedded bootloader and kernel are in flash memory partitions and not mountable filesystems.

Haffon commented 5 years ago
  1. make TARGET=ARMV6 failed. 2.for booting, this is the "sudo fdisk -l" output Disk /dev/mmcblk0: 7.2 GiB, 7752122368 bytes, 15140864 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0x00000000

Device Boot Start End Sectors Size Id Type /dev/mmcblk0p1 * 180480 15239167 15058688 7.2G b W95 FAT32 /dev/mmcblk0p2 73728 139263 65536 32M 6 FAT16 /dev/mmcblk0p3 1 41216 41216 20.1M 5 Extended /dev/mmcblk0p5 139264 139519 256 128K 83 Linux /dev/mmcblk0p6 139520 180479 40960 20M 83 Linux

Partition table entries are not in disk order.

Disk /dev/mmcblk0boot1: 4 MiB, 4194304 bytes, 8192 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/mmcblk0boot0: 4 MiB, 4194304 bytes, 8192 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes

martin-frbg commented 5 years ago

Target ARMV6 is almost identical to ARMV7 so you would expect to see the same failure(s)(KERNEL.ARMV7 includes KERNEL.ARMV6 and adds kernels for nrm2 and gemm). Linbian appears to use u-boot (which I do not find in their SD card image either) - probably easier to find me some debian-armel image for qemu.

brada4 commented 5 years ago

2 boot partitions are there. But dont try to mount them at home without full documentation on brick recovery at hand...

martin-frbg commented 5 years ago

Both the AMAX bug and the missing utest build should be fixed now on the develop branch. Unfortunately I have not yet been able to set up a (cross)compiler environment for softfp that includes gfortran, so I cannot investigate the CHERK error from the original report right now.

Haffon commented 5 years ago

The "make" without "TARGET=" passed: OpenBLAS build complete. (BLAS CBLAS LAPACK LAPACKE)

OS ... Linux
Architecture ... arm
BINARY ... 32bit
C compiler ... GCC (command line : cc) Fortran compiler ... GFORTRAN (command line : gfortran) Library Name ... libopenblas_armv7p-r0.3.7.dev.a (Multi threaded; Max num-threads is 4)

then can we ignore this error : cblas_cherk PASSED THE TESTS OF ERROR-EXITS

FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE EXPECTED RESULT COMPUTED RESULT 1 ( 0.441034 , 0.00000 ) ( 0.441034 , 0.00000 ) 2 ( 0.521087E-01, -0.416458 ) ( 0.521087E-01, -0.416458 ) 3 ( -0.459436 , -0.371565 ) ( -0.292707 , -0.492507 ) THESE ARE THE RESULTS FOR COLUMN 1 cblas_cherk FAILED ON CALL NUMBER: 698: cblas_cherk ( CblasColMajor, CblasLower, CblasNoTrans, 3, 1, 1.0, A, 4, 1.0, C, 4). cblas_cherk FAILED ON CALL NUMBER: 217: cblas_cherk ( CblasRowMajor, CblasUpper, CblasNoTrans, 1, 0, 0.0, A, 2, 0.0, C, 2).

FATAL ERROR - TESTS ABANDONED

brada4 commented 5 years ago

No, you should not ignore numeric failures.

cblas_cherk has 2 selectable options for different code paths. Try in interface/syrk.c near top, one at a time

#undef SMP

Then

#define USE_SIMPLE_THREADED_LEVEL3

Please tell if either worked out.

martin-frbg commented 5 years ago

This is probably another case of missing softfp capability somewhere in the code, as I do not get this error on ARMV7 hardfp. (I had a small hope that it might be fixed by the amax correction, but it is much more likely to be in SCAL or in GEMM itself)

Haffon commented 5 years ago

It is not work after adding the macro to interface/syrk.c.

brada4 commented 5 years ago

Martin already said so. Some assemblies are not correct for softfp, it is not multiprocessing issue.

martin-frbg commented 5 years ago

Unfortunately I have no quick suggestion - if there is a problem in @ashwinyes' softfp modifications of the CGEMM assembly kernel I am unable to see it, and both CSCAL and CGEMM_BETA appear to be implemented in generic C code.

brada4 commented 5 years ago

I think best choice at present is TARGET=ARMv5, there is anticipated missing performance, but at least it works.

ashwinyes commented 5 years ago

Few things to note.

Regarding amax.

Regarding CHERK

Haffon commented 5 years ago

@ashwinyes I will do it described in "Regarding CHERK" after package installation is done.

Haffon commented 5 years ago

The "cblas_cherk" error disappear after I replace -O2 by -O0 and unlimite the stack size.

ashwinyes commented 5 years ago

Thanks. Would you be able to narrow it down as to which one actually helps ?

Haffon commented 5 years ago

@ashwinyes The cblas_cherk error appear if I turn the optimize on(-O2).

ashwinyes commented 5 years ago

@Haffon Looks to be a compiler issue. Would it be possible to change the compiler and re-test ?

brada4 commented 5 years ago

gcc 6.3.0 is part of Ubuntu zesty, long EOL, so compiler problems are essentially unfixable if found. You need to move to LTS (e.g. 18.04) to get it working. e.g. https://wiki.odroid.com/odroid-c2/os_images/ubuntu/ubuntu Probably nobody tried setarch, so maybe try clean(-er) cross-build (Your build host is aarch64, target is softfp): make CC=arm-????-gcc HOSTCC=cc FC=arm-????-gfortran TARGET=ARMv7