OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.29k stars 1.49k forks source link

Compilation terminates with: ieee_inexact is signaling #2901

Closed moravveji closed 3 years ago

moravveji commented 3 years ago

Dear all

I am trying to build OpenBLAS v. 0.3.10 on an AMD naples node using the AOCC/2.2.0 prebuilt compilers (clang and flang). Each node has two sockets with 32 cores per socket, and running CentOS 7.8 as OS.

At the end of the build, I get the following error message and the build terminates:

END OF TESTS
Warning: ieee_inexact is signaling
FORTRAN STOP
make[1]: Leaving directory `/work/OpenBLAS/OpenBLAS-0.3.10/ctest'
make: *** [tests] Error 2 

Here is the full make command that I execute:

make TARGET=ZEN CC=clang FC=flang BINARY=64 USE_THREAD=1 USE_OPENMP=1 NUM_THREADS=64 NUM_PARALLEL=64 BIGNUMA=0 NO_AVX=1 NO_AVX2=0 NO_AVX512=1 MAX_STACK_ALLOC=2048

In the Makefile.rule I see no relevant variable/control over the IEEE precision tolerance for tests, hence, my hands are short here. Do you have an idea how to circumvent the fail, and help it pass?

Thanks in advance. Ehsan

martin-frbg commented 3 years ago

Can you upload the full build log please ? I do not think the IEEE_INEXACT is the actual cause, probably some critical error was reported earlier

moravveji commented 3 years ago

Thanks @martin-frbg for your swift reply. I see a lot of expr: syntax error in the build log, which I cannot easily associate to the make arguments. Please find the log file attached. log.tar.gz

martin-frbg commented 3 years ago

Not sure what the expr errors are about, but the actual problem appears to be this:

OMP_NUM_THREADS=2 ./xccblat2 < cin2
FIO-F-217/list-directed read/unit=5/attempt to read past end of file.
 File name = 'stdin ',    formatted, sequential access   record = 1
 In source file c_cblat2.f, at line number 129
make[1]: *** [all2] Error 127
make[1]: *** Waiting for unfinished jobs....
TESTS OF THE COMPLEX*16        LEVEL 3 BLAS

which is either a corrupt/truncated input file or a bug in the test. (ISTR fixing something like this in the develop branch, so the quickest solution would probably be to cherrypick the current c_cblat2.f from there. (Or wait a few hours for me to either lose my mind or release 0.3.11 - possibly both)

The various IEEE errors are most probably generated by functions in the LAPACK code that actively check for IEEE-conformant behaviour of the host.

moravveji commented 3 years ago

@martin-frbg: if the release of 0.3.11 is an option, I am patient enough to wait ;-)

moravveji commented 3 years ago

Thanks @martin-frbg for the fix. The new release (0.3.11) indeed compiles flawlessly.

martin-frbg commented 3 years ago

Unfortunately 0.3.11 has some other problems, like not including all double-precision complex functions in the library - there will be another release soon

moravveji commented 3 years ago

I was not aware of that, so, thanks for the heads-up. Don't you mind notifying me here to fetch the new release once it is fired?

martin-frbg commented 3 years ago

Reopening as a reminder to let you know as soon as all known bugs are replaced by new ones.

martin-frbg commented 3 years ago

Released 0.3.12 now, hope this clears up all the fallout from 0.3.11 BTW the "expr: syntax error" were probably caused by a broken check for the gcc version, please let me know if you still see them with 0.3.12

moravveji commented 3 years ago

Thanks for the last release. I also easily compiled it; however, I still see the "expr: syntax error" message everywhere. What worries me is that you do not see this, and hence cannot reproduce it on your platform. I am using AOCC v.2.2.0 compilers on an AMD Naples machine with CentOS 7.8.

martin-frbg commented 3 years ago

Found a spurious "expr" in a shell call to query the flang version now (Makefile.system near line 860) - this should be harmless as the call only served to identify an earlier version of AOCC flang that required a workaround for a complier bug)