OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.38k stars 1.5k forks source link

Build DYNAMIC_ARCH wrongly uses FMA in 0.3.13 #3056

Closed nshmyrev closed 3 years ago

nshmyrev commented 3 years ago

Due to changes in Makfile.x86_64, in release 0.3.13 we have fma options used with DYNAMIC_ARCH:

make: Entering directory '/home/shmyrev/travis/kaldi/tools/OpenBLAS'
make[1]: Entering directory '/home/shmyrev/travis/kaldi/tools/OpenBLAS/interface'
cc -DCBLAS -c -O2 -DMAX_STACK_ALLOC=2048 -DUSE_LOCKING -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DDYNAMIC_ARCH -DNO_LAPACK -DNO_LAPACKE -DNO
_WARMUP -DMAX_CPU_NUMBER=8 -DMAX_PARALLEL_NUMBER=1 -DBUILD_SINGLE=1 -DBUILD_DOUBLE=1 -DBUILD_COMPLEX=1 -DBUILD_COMPLEX16=1 -DVERSION=\"0
.3.13\" -msse3 -mssse3 -msse4.1 -mavx -mavx2 -mfma -mavx2 -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME=cblas_s
gbmv -DASMFNAME=cblas_sgbmv_ -DNAME=cblas_sgbmv_ -DCNAME=cblas_sgbmv -DCHAR_NAME=\"cblas_sgbmv_\" -DCHAR_CNAME=\"cblas_sgbmv\" -DNO_AFFI
NITY -I.. -I. -UDOUBLE  -UCOMPLEX -o cblas_sgbmv.o gbmv.c
cc -DCBLAS -c -O2 -DMAX_STACK_ALLOC=2048 -DUSE_LOCKING -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DDYNAMIC_ARCH -DNO_LAPACK -DNO_LAPACKE -DNO
_WARMUP -DMAX_CPU_NUMBER=8 -DMAX_PARALLEL_NUMBER=1 -DBUILD_SINGLE=1 -DBUILD_DOUBLE=1 -DBUILD_COMPLEX=1 -DBUILD_COMPLEX16=1 -DVERSION=\"0
.3.13\" -msse3 -mssse3 -msse4.1 -mavx -mavx2 -mfma -mavx2 -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME=cblas_d
gbmv -DASMFNAME=cblas_dgbmv_ -DNAME=cblas_dgbmv_ -DCNAME=cblas_dgbmv -DCHAR_NAME=\"cblas_dgbmv_\" -DCHAR_CNAME=\"cblas_dgbmv\" -DNO_AFFI
NITY -I.. -I. -DDOUBLE  -UCOMPLEX -o cblas_dgbmv.o gbmv.c

This leads to illegal instructions in dgbmv_t as here https://github.com/alphacep/vosk-api/issues/355:

0x00007ffff5168a3c <+380>:  callq  *0x308(%r10)
   0x00007ffff5168a43 <+387>:   vmovsd 0x10(%rsp),%xmm1
=> 0x00007ffff5168a49 <+393>:   vfmadd213sd -0x8(%r12),%xmm1,%xmm0
   0x00007ffff5168a50 <+400>:   add    0x18(%rsp),%r13
   0x00007ffff5168a55 <+405>:   vmovsd %xmm0,-0x8(%r12)
   0x00007ffff5168a5c <+412>:   cmp    %rbx,0x20(%rsp)

In master latest makefile changes are missing. Is something going wrong with the last release?

brada4 commented 3 years ago

Unless you specify TARGET=CORE2 or something oldest you are targetting - ISAs from build machine get used for library's common code. Still stands with v0.3.13.

Your build log segment does not match function disassembled maybe attach complete log to get better insight.

nshmyrev commented 3 years ago

I use this line:

make -C OpenBLAS ONLY_CBLAS=1 DYNAMIC_ARCH=1 USE_LOCKING=1 USE_THREAD=0 all

it used to work in 0.3.7 without target specification.

Readme says:

The TARGET option can be used in conjunction with DYNAMIC_ARCH=1 to specify which cpu model should be assumed for all the common code in the library, usually you will want to set this to the oldest model you expect to encounter. Please note that it is not possible to combine support for different architectures, so no combined 32 and 64 bit or x86_64 and arm64 in the same library.

I assume I don't have to specify the target.

martin-frbg commented 3 years ago

I believe this is already fixed on the develop branch (master is currently stale). In 0.3.7 the codebase did not yet make use of Intel intrinsics, so compiler options to enable FMA etc. were not added automatically as they are now. Generally you would need to set TARGET to the oldest cpu that you expect to run the DYNAMIC_ARCH build on, but in earlier versions you were more likely to get away with leaving this out.

nshmyrev commented 3 years ago

Ok, thank you for information. I'll set the TARGET as a workaround but I must warn that the pattern without TARGET is very common:

https://github.com/jupyter-attic/docker-notebook/blob/5bbc72eeecbfac64beef89a8767698d1fa78f9c6/scipystack/build_scipy_stack.sh#L14

and you might get many complains about this change.

nshmyrev commented 3 years ago

And develop branch still uses avx/avx2 which might not be available on atoms (one of the frequent targets).

brada4 commented 3 years ago

it is not a change, it is there since GotoBLAS. At that time it was common HPC practice to set unpatched machine with old CPU to build software, so result worked well. Maybe it is worth setting old target for DYNAMIC_ARCH in place of native one.

martin-frbg commented 3 years ago

I've even made the build system report the minimum required hardware at the end of DYNAMIC_ARCH builds, but I guess as soon as this gets incorporated into some build_my_dependencies thingy nobody actually reads the message anymore. (This is one of the things were the original GotoBLAS assumption breaks down, that the library will be purpose-built for the known range of computers in a user's possession.)

nshmyrev commented 3 years ago

Thank you for comments guys, its up to you to close this.

brada4 commented 3 years ago

Thanks for reporting. It already keeps coming, you were first one to point some source why....

martin-frbg commented 3 years ago

Closing after finally confirming that 0.3.13 already (i.e. not just the develop branch) behaves as intended with DYNAMIC_ARCH. The fairly recent introduction of the -mfma option only makes it more likely that omission of a suitable TARGET argument creates an unexpected dependency on cpu features of the build host.