Closed virtuald closed 4 years ago
Looking more at the compile log, this pops out at me:
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat3 < ./sblat3.dat
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./dblat3 < ./dblat3.dat
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./cblat2 < ./cblat2.dat
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./cblat3 < ./cblat3.dat
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./zblat2 < ./zblat2.dat
TESTS OF THE COMPLEX LEVEL 3 BLAS
THE FOLLOWING PARAMETER VALUES WILL BE USED:
FOR N 0 1 2 3 7 31
FOR ALPHA ( 0.0, 0.0) ( 1.0, 0.0) ( 0.7,-0.9)
FOR BETA ( 0.0, 0.0) ( 1.0, 0.0) ( 1.3,-1.1)
ERROR-EXITS WILL NOT BE TESTED
ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN 16.00
RELATIVE MACHINE PRECISION IS TAKEN TO BE 1.2E-07
CGEMM PASSED THE COMPUTATIONAL TESTS ( 17496 CALLS)
CHEMM PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CSYMM PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CTRMM PASSED THE COMPUTATIONAL TESTS ( 2592 CALLS)
CTRSM PASSED THE COMPUTATIONAL TESTS ( 2592 CALLS)
******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
EXPECTED RESULT COMPUTED RESULT
1 ( -0.153378 , 0.00000 ) ( -0.153378 , 0.00000 )
2 ( 0.714370 , 0.406021 ) ( 0.714370 , 0.406021 )
3 ( 0.209727 , 0.325695 ) ( 0.969031E-01, 0.298701 )
THESE ARE THE RESULTS FOR COLUMN 1
******* CHERK FAILED ON CALL NUMBER:
698: CHERK ('L','N', 3, 1, 1.0, A, 4, 1.0, C, 4) .
CSYRK PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CHER2K PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CSYR2K PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
END OF TESTS
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./zblat3 < ./zblat3.dat
rm -f ?BLAT3.SUMM
OPENBLAS_NUM_THREADS=2 ./sblat3 < ./sblat3.dat
rm -f ?BLAT2.SUMM
OPENBLAS_NUM_THREADS=2 ./sblat2 < ./sblat2.dat
OPENBLAS_NUM_THREADS=2 ./dblat3 < ./dblat3.dat
OPENBLAS_NUM_THREADS=2 ./dblat2 < ./dblat2.dat
OPENBLAS_NUM_THREADS=2 ./cblat3 < ./cblat3.dat
OPENBLAS_NUM_THREADS=2 ./cblat2 < ./cblat2.dat
TESTS OF THE COMPLEX LEVEL 3 BLAS
THE FOLLOWING PARAMETER VALUES WILL BE USED:
FOR N 0 1 2 3 7 31
FOR ALPHA ( 0.0, 0.0) ( 1.0, 0.0) ( 0.7,-0.9)
FOR BETA ( 0.0, 0.0) ( 1.0, 0.0) ( 1.3,-1.1)
ERROR-EXITS WILL NOT BE TESTED
ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN 16.00
RELATIVE MACHINE PRECISION IS TAKEN TO BE 1.2E-07
CGEMM PASSED THE COMPUTATIONAL TESTS ( 17496 CALLS)
CHEMM PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CSYMM PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CTRMM PASSED THE COMPUTATIONAL TESTS ( 2592 CALLS)
CTRSM PASSED THE COMPUTATIONAL TESTS ( 2592 CALLS)
******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
EXPECTED RESULT COMPUTED RESULT
1 ( -0.153378 , 0.00000 ) ( -0.153378 , 0.00000 )
2 ( 0.714370 , 0.406021 ) ( 0.714370 , 0.406021 )
3 ( 0.209727 , 0.325695 ) ( 0.969031E-01, 0.298701 )
THESE ARE THE RESULTS FOR COLUMN 1
******* CHERK FAILED ON CALL NUMBER:
698: CHERK ('L','N', 3, 1, 1.0, A, 4, 1.0, C, 4) .
CSYRK PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CHER2K PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CSYR2K PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
END OF TESTS
OPENBLAS_NUM_THREADS=2 ./zblat3 < ./zblat3.dat
OPENBLAS_NUM_THREADS=2 ./zblat2 < ./zblat2.dat
make[1]: Leaving directory '/home/admin/v0.3.4/test'
Interesting, I did not see these failures on Cortex A17 and the samax microkernel was last touched in #740 (apart from a trivial instruction change a few months ago, but that happened after the 0.2.20 release that you say is also affected). Is CHERK the only test failure with that dreaded old "less than half accurate" message?
A quick workaround for the SAMAX issue would be to append
SAMAXKERNEL = amax.c
DAMAXKERNEL = amax.c
CAMAXKERNEL = zamax.c
ZAMAXKERNEL = zamax.c
in kernel/arm/KERNEL.CORTEXA9. (Not entirely sure if this would address the CHERK problem as well, but probably not. )
Could you post longer, like full build log? Is arm64 working correctly with your CPU?
Here's the full logfile: openblas_log.txt
No incidents during build.
Modified the kernel file, did a build, still failed. Doing a make clean
followed by a build... seems the CHERK tests are still failing:
******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
EXPECTED RESULT COMPUTED RESULT
1 ( -0.153378 , 0.00000 ) ( -0.153378 , 0.00000 )
2 ( 0.714370 , 0.406021 ) ( 0.714370 , 0.406021 )
3 ( 0.209727 , 0.325695 ) ( 0.969031E-01, 0.298701 )
THESE ARE THE RESULTS FOR COLUMN 1
******* CHERK FAILED ON CALL NUMBER:
698: CHERK ('L','N', 3, 1, 1.0, A, 4, 1.0, C, 4)
However, running a new utest seems to have fixed that issue.
By checking arm64, I assume you mean building on the ODROID host outside of the docker image?
Your CPU is a 64bit A53, and will work faster with 64bit build on a 64bit OS (though it is important here to have last avail 32bit option to work fine too)
CHERK codepath involves syrk/gemm but the individual tests for those appear to pass (as does the ZHERK test). I will try to do a TARGET=CORTEXA9 build on my A17 if and when I have time. (I do wonder if this could be a compiler problem though)
@brada4 the target ARM system that I'll be running the compiled OpenBLAS on is a A9, so this is vaguely a cross compile. It's just the ODROID is way faster at compiling things, so I'd rather use that to compile. :)
@martin-frbg totally could be a compiler thing seeing as I had to build the fortran compiler package myself via bitbake... any thoughts on how I can detect that?
No idea except perhaps trying another compiler build or version. Unmodified "develop" branch passes all tests on my Cortex A17 (Asus Tinkerboard, gcc 6.3.0) with TARGET=CORTEXA9 (which btw appears to be pretty much identical to the ARMV7 target that gets built by default)
Can you try TARGET=ARMV5 which is pure C before implicating your gfortran? It is quite unlikely that dozen tests were OK with same gfortran, then one choked.
Tried TARGET=ARMV5, but the tests segfaulted. ldd
says it was linked to libgfortran, so I restarted the docker image and did not install gfortran et al in it. Once I did that, it built without any issues, but there doesn't seem to be any tests?
Tests are compiled with fortran compiler, so in absence of such they dont compile. You have to run "make clean" between build parameter changes, there should be no difference in interface where fortran links to.
I have the same issue, and the failed number is just same with virtuald. I'm compiling OpenBLAS 0.3.6 for model name : ARMv7 Processor rev 5 (v7l) BogoMIPS : 72.00 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae
@Haffon are you (cross)compiling this on a 64bit processor in 32bit mode (like virtuald did), or how/where do you compile ?
@martin-frbg I‘m not cross compiling. I just download the 0.3.6 code into the arm board and type make. I have done the workaround with the SAMAX in kernel/arm/KERNEL.CORTEXA9 and the error still exist. I can ssh to that board and the board have HDMI and usb ports, so it can by used as a standalone like PC. An debian based OS has been installed in this board. uname -a Linux linbian 4.4.55+ #1 SMP PREEMPT Thu Jan 31 19:03:50 PST 2019 armv7l GNU/Linux I even can use sudo apt-get install libsnappy-dev to install packages.
ARMv7 is 32bit, I do not know what is that linux32. Probably no forced arch or target since whole build is on the host.
Interesting, so same as with my arm board (Asus Tinkerboard) where I did not see these failures.
since there is no installation instruction for arm-linux, so I just type make and enter. The Android section is tell me to do cross compile without Fortran, they are not my case. I changed the compile command to make TARGET=ARMV7 to make tools think they are cross compiling. Does they work for me, and what make install do if it is in cross compile?
Both just "make" and "make TARGET=ARMV7" should work (without the TARGET argument it would autodetect the cpu, which should result in ARMV7 as well - nothing to do with cross-compile). My Tinkerboard identifies as ARMv7 rev 1 (v7l) with additional cpu features "thumbee" and "evtstrm" that probably play no role here. gcc default target (from gcc -v
) is arm-linux-gnueabihf, maybe that is the difference ?
my "gcc -v" show the "--target=arm-linux-gnueabi", "cat /proc/cpuinfo" has no "thumbee" and "evtstrm" feature. use "make TARGET=ARMV7" doesn't work, I finally using "make NOFORTRAN=1" and the compilation is done, hope it will work in my application since there is no utest.
Alright. So the SAMAX problem seems to be specific to "softfp" builds (while my Tinkerboard is set up for hardfp so I cannot reproduce the bug there - I need to find out if I need to (and can) install another gcc toolchain to change this).
With NOFORTRAN=1 it will not build the BLAS and LAPACK tests as these are written in fortran, not sure why it would not build and run the utests.
EDIT: seems to be a small bug in the Makefile logic - you can run "make" in the utest directory after
building the library to get the utests.
Some body told me to use this branch "https://github.com/xianyi/OpenBLAS/tree/arm_soft_fp_abi", but still not work as "gnu/stubs-hard.h: No such file or directory". In fact I'm porting caffe to some kind for development board which has arm core, I can't do "sudo apt-get install libopenblas-dev" described in "https://github.com/OAID/Caffe-HRT/blob/master/acl_openailab/installation.md", so I try to compile it manually. Thanks.
That is 2 years untouched development branch. Probably you need to add universe or something like that. Packages are alive and well: https://packages.debian.org/stretch/libopenblas-base https://packages.ubuntu.com/disco/armhf/libopenblas-dev
The soft_fp_abi branch is from the first attempts to support softfp two years ago, only a few functions in it were modified back then - but it looks as if iamax_vfp.S and possibly other files are still not modified for softfp in the current version. (From looking at sdot_vfp.S, the change could be as simple as moving the computed result to a different register before returning)
@brada4 It is good way to download the .deb, but the site list of "https://packages.ubuntu.com/disco/armhf/libopenblas-dev/download" is blank. Debian package is downloadable "https://packages.debian.org/stretch/armhf/libopenblas-base/download" except the version is 0.2.19.
The debian architecture should be armel, but only armhf downloading url is present for libopenblas-base. After I run "readelf -A /proc/self/exe | grep Tag_ABI_VFP_args" nothing return, and "make TARGET_ARCH=armel" failed with "cc: error: armel: No such file or directory" when making utest.
armel is fpu-less calling convention TARGET=ARMv5 will use no assembly, thus be indifferent from calling convention. Try that.
Hmm. I downloaded the Linbian sd image in the hope of running it from qemu, but I cannot find a kernel or initrd in it (/boot appears to be empty, but I see kernel modules under lib/modules/4.4.55+) Does this use some nonconventional boot process, or is there something wrong with the "eagle-debian-lindeni-v5-flat" image of 2019-02-01?
@brada4 using TARGET=ARMV5 make build complete. @martin-frbg I also have empty /boot directory.
It is SMP at least. Lets try v6 and v7, reporting bugs here, there is still double speed waiting.
Answering about /boot - bootloaders as files are are thing of a PC, it is quite possible that embedded bootloader and kernel are in flash memory partitions and not mountable filesystems.
Device Boot Start End Sectors Size Id Type /dev/mmcblk0p1 * 180480 15239167 15058688 7.2G b W95 FAT32 /dev/mmcblk0p2 73728 139263 65536 32M 6 FAT16 /dev/mmcblk0p3 1 41216 41216 20.1M 5 Extended /dev/mmcblk0p5 139264 139519 256 128K 83 Linux /dev/mmcblk0p6 139520 180479 40960 20M 83 Linux
Partition table entries are not in disk order.
Disk /dev/mmcblk0boot1: 4 MiB, 4194304 bytes, 8192 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/mmcblk0boot0: 4 MiB, 4194304 bytes, 8192 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes
Target ARMV6 is almost identical to ARMV7 so you would expect to see the same failure(s)(KERNEL.ARMV7 includes KERNEL.ARMV6 and adds kernels for nrm2 and gemm). Linbian appears to use u-boot (which I do not find in their SD card image either) - probably easier to find me some debian-armel image for qemu.
2 boot partitions are there. But dont try to mount them at home without full documentation on brick recovery at hand...
Both the AMAX bug and the missing utest build should be fixed now on the develop branch. Unfortunately I have not yet been able to set up a (cross)compiler environment for softfp that includes gfortran, so I cannot investigate the CHERK error from the original report right now.
The "make" without "TARGET=" passed: OpenBLAS build complete. (BLAS CBLAS LAPACK LAPACKE)
OS ... Linux
Architecture ... arm
BINARY ... 32bit
C compiler ... GCC (command line : cc)
Fortran compiler ... GFORTRAN (command line : gfortran)
Library Name ... libopenblas_armv7p-r0.3.7.dev.a (Multi threaded; Max num-threads is 4)
then can we ignore this error : cblas_cherk PASSED THE TESTS OF ERROR-EXITS
FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE EXPECTED RESULT COMPUTED RESULT 1 ( 0.441034 , 0.00000 ) ( 0.441034 , 0.00000 ) 2 ( 0.521087E-01, -0.416458 ) ( 0.521087E-01, -0.416458 ) 3 ( -0.459436 , -0.371565 ) ( -0.292707 , -0.492507 ) THESE ARE THE RESULTS FOR COLUMN 1 cblas_cherk FAILED ON CALL NUMBER: 698: cblas_cherk ( CblasColMajor, CblasLower, CblasNoTrans, 3, 1, 1.0, A, 4, 1.0, C, 4). cblas_cherk FAILED ON CALL NUMBER: 217: cblas_cherk ( CblasRowMajor, CblasUpper, CblasNoTrans, 1, 0, 0.0, A, 2, 0.0, C, 2).
FATAL ERROR - TESTS ABANDONED
No, you should not ignore numeric failures.
cblas_cherk has 2 selectable options for different code paths. Try in interface/syrk.c near top, one at a time
#undef SMP
Then
#define USE_SIMPLE_THREADED_LEVEL3
Please tell if either worked out.
This is probably another case of missing softfp capability somewhere in the code, as I do not get this error on ARMV7 hardfp. (I had a small hope that it might be fixed by the amax correction, but it is much more likely to be in SCAL or in GEMM itself)
It is not work after adding the macro to interface/syrk.c.
Martin already said so. Some assemblies are not correct for softfp, it is not multiprocessing issue.
Unfortunately I have no quick suggestion - if there is a problem in @ashwinyes' softfp modifications of the CGEMM assembly kernel I am unable to see it, and both CSCAL and CGEMM_BETA appear to be implemented in generic C code.
I think best choice at present is TARGET=ARMv5, there is anticipated missing performance, but at least it works.
Few things to note.
Regarding amax.
Regarding CHERK
@ashwinyes I will do it described in "Regarding CHERK" after package installation is done.
The "cblas_cherk" error disappear after I replace -O2 by -O0 and unlimite the stack size.
Thanks. Would you be able to narrow it down as to which one actually helps ?
@ashwinyes The cblas_cherk error appear if I turn the optimize on(-O2).
@Haffon Looks to be a compiler issue. Would it be possible to change the compiler and re-test ?
gcc 6.3.0 is part of Ubuntu zesty, long EOL, so compiler problems are essentially unfixable if found.
You need to move to LTS (e.g. 18.04) to get it working.
e.g. https://wiki.odroid.com/odroid-c2/os_images/ubuntu/ubuntu
Probably nobody tried setarch, so maybe try clean(-er) cross-build (Your build host is aarch64, target is softfp):
make CC=arm-????-gcc HOSTCC=cc FC=arm-????-gfortran TARGET=ARMv7
System configuration:
setarch linux32
.Compile was via:
Here's the output:
What steps should I take next to diagnose this issue? Thanks!