Closed schiotz closed 1 year ago
@grisuthedragon Any thoughts on this?
I can reproduce this too, looks like it's a segfault problem.
GDB backtrace:
Program received signal SIGSEGV, Segmentation fault.
0x00001555512cc16a in zdot_compute () from /apps/gent/RHEL8/haswell-ib/software/OpenBLAS/0.3.18-GCC-11.2.0/lib/libopenblas.so.0
(gdb) bt
#0 0x00001555512cc16a in zdot_compute () from /apps/gent/RHEL8/haswell-ib/software/OpenBLAS/0.3.18-GCC-11.2.0/lib/libopenblas.so.0
#1 0x00001555512cc28e in zdotu_k () from /apps/gent/RHEL8/haswell-ib/software/OpenBLAS/0.3.18-GCC-11.2.0/lib/libopenblas.so.0
#2 0x0000155552df5e06 in flexiblas_real_cblas_zdotu_sub () from /apps/gent/RHEL8/haswell-ib/software/FlexiBLAS/3.0.4-GCC-11.2.0/lib/libflexiblas.so.3
#3 0x0000155553116340 in CDOUBLE_dot (ip1=0x90e880 "", is1=16, ip2=0x155545434300 "", is2=1632960, op=0x1555427ea760 "", n=28, __NPY_UNUSED_TAGGEDignore=0x0) at build/src.linux-x86_64-3.9/numpy/core/src/multiarray/arraytypes.c:3628
#4 0x00001555531f2928 in PyArray_MatrixProduct2 (op1=<optimized out>, op2=<optimized out>, out=<optimized out>) at numpy/core/src/npymath/funcs.inc.src:1083
#5 0x00001555531f3003 in array_matrixproduct (__NPY_UNUSED_TAGGEDdummy=<optimized out>, args=<optimized out>, kwds=<optimized out>) at numpy/core/src/npymath/funcs.inc.src:2438
#6 0x00001555550a177e in cfunction_call (func=0x15555351b360, args=<optimized out>, kwargs=<optimized out>) at /tmp/vsc40003/easybuild/Python/3.9.6/GCCcore-11.2.0/Python-3.9.6/descrobject.c:539
#7 0x00001555550a0cbf in _PyObject_Call (tstate=0x407d20, callable=0x15555351b360, args=0x155553595e80, kwargs=<optimized out>) at /tmp/vsc40003/easybuild/Python/3.9.6/GCCcore-11.2.0/Python-3.9.6/genobject.c:281
#8 0x0000155553134298 in array_implement_array_function (__NPY_UNUSED_TAGGEDdummy=<optimized out>, positional_args=<optimized out>) at numpy/core/src/multiarray/arrayfunction_override.c:367
#9 0x00001555550a17a0 in cfunction_call (func=0x15555351bbd0, args=<optimized out>, kwargs=<optimized out>) at /tmp/vsc40003/easybuild/Python/3.9.6/GCCcore-11.2.0/Python-3.9.6/descrobject.c:548
#10 0x0000155555094932 in _PyObject_MakeTpCall (tstate=0x407d20, callable=0x15555351bbd0, args=<optimized out>, nargs=5, keywords=<optimized out>) at /tmp/vsc40003/easybuild/Python/3.9.6/GCCcore-11.2.0/Python-3.9.6/genobject.c:191
#11 0x000015555508d8ae in _PyObject_VectorcallTstate (kwnames=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, nargsf=9223372036854775813, args=0x155547e90ad8, callable=0x15555351bbd0,
tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at ./Python/pycore_pyerrors.h:116
#12 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775813, args=0x155547e90ad8, callable=0x15555351bbd0, tstate=<optimized out>) at ./Python/pycore_pyerrors.h:103
#13 PyObject_Vectorcall (kwnames=0x0, nargsf=9223372036854775813, args=0x155547e90ad8, callable=0x15555351bbd0) at ./Python/pycore_pyerrors.h:127
#14 call_function (kwnames=0x0, kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, oparg=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, pp_stack=<synthetic pointer>,
pp_stack@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x407d20, tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at Objects/ceval_gil.h:5072
#15 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x155547e90950, throwflag=<optimized out>) at Objects/ceval_gil.h:3518
#16 0x000015555508b8d9 in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>,
tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /tmp/vsc40003/easybuild/Python/3.9.6/GCCcore-11.2.0/Python-3.9.6/codeobject.c:40
#17 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x479108, kwcount=0, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0,
name=0x1555486030f0, qualname=0x155548603170) at Objects/ceval_gil.h:4327
#18 0x0000155555098566 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /tmp/vsc40003/easybuild/Python/3.9.6/GCCcore-11.2.0/Python-3.9.6/genobject.c:396
#19 0x000015555508d79d in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x4790f8, callable=0x155548602160, tstate=0x407d20) at ./Python/pycore_pyerrors.h:118
#20 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x4790f8, callable=<optimized out>) at ./Python/pycore_pyerrors.h:127
#21 call_function (kwnames=0x0, kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, oparg=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, pp_stack=<synthetic pointer>,
pp_stack@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x407d20, tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at Objects/ceval_gil.h:5072
#22 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x478f80, throwflag=<optimized out>) at Objects/ceval_gil.h:3487
#23 0x000015555508b8d9 in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>,
tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /tmp/vsc40003/easybuild/Python/3.9.6/GCCcore-11.2.0/Python-3.9.6/codeobject.c:40
#24 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0,
qualname=0x0) at Objects/ceval_gil.h:4327
#25 0x00001555550fe871 in _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=0x155553720a00, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0,
closure=0x0, name=0x0, qualname=0x0) at Objects/ceval_gil.h:4359
#26 0x00001555550fe819 in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at Objects/ceval_gil.h:4375
#27 0x00001555550fe7db in PyEval_EvalCode (co=co@entry=0x15555371e9d0, globals=globals@entry=0x155553720a00, locals=locals@entry=0x155553720a00) at Objects/ceval_gil.h:826
#28 0x000015555510d7d4 in run_eval_code_obj (tstate=0x407d20, co=0x15555371e9d0, globals=0x155553720a00, locals=0x155553720a00) at Modules/find.h:1219
#29 0x00001555551097c6 in run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x155553720a00, locals=0x155553720a00, flags=<optimized out>, arena=<optimized out>) at Modules/find.h:1240
#30 0x0000155555015d40 in pyrun_file (fp=fp@entry=0x403340, filename=filename@entry=0x1555537fced0, start=start@entry=257, globals=globals@entry=0x155553720a00, locals=locals@entry=0x155553720a00, closeit=closeit@entry=1, flags=0x7ffffffef498) at Modules/find.h:1138
#31 0x0000155555015005 in pyrun_simple_file (flags=0x7ffffffef498, closeit=1, filename=0x1555537fced0, fp=0x403340) at Modules/find.h:449
#32 PyRun_SimpleFileExFlags (fp=fp@entry=0x403340, filename=<optimized out>, closeit=closeit@entry=1, flags=flags@entry=0x7ffffffef498) at Modules/find.h:482
#33 0x000015555501797e in PyRun_AnyFileExFlags (fp=fp@entry=0x403340, filename=<optimized out>, closeit=closeit@entry=1, flags=flags@entry=0x7ffffffef498) at Modules/find.h:91
#34 0x000015555511ec65 in pymain_run_file (cf=0x7ffffffef498, config=0x409010) at Objects/fileutils.c:373
#35 pymain_run_python (exitcode=0x7ffffffef490) at Objects/fileutils.c:598
#36 Py_RunMain () at Objects/fileutils.c:677
#37 0x00001555550f2009 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Objects/fileutils.c:731
#38 0x0000155553873493 in __libc_start_main () from /lib64/libc.so.6
#39 0x00000000004006ce in _start ()
@schiotz It looks like this could be a problem in recent versions of OpenBLAS, and that FlexiBLAS has nothing to do with it (but I'm not sure)
@boegel Do you have any suggestions for workarounds or how to fix it? It is a showstopper for us, basically locking us on the foss/2020b toolchain. We see this core dump all over our code.
Maybe you can try installing OpenBLAS-0.3.20-GCC-11.2.0.eb
, and then swapping to the OpenBLAS module after loading SciPy-bundle, and see if the problem persists? It could be a bug that is fixed in OpenBLAS already...
If that doesn't help, it gets more interesting, we would need to hunt down the cause of the problem, and probably come up with a patch to fix it.
We should also see if the problem only happens if FlexiBLAS is used (by tweaking foss/2021b
to include OpenBLAS
directly).
zdotu
is one of the 4 functions affected by different Fortran calling conventions depending on the compiler, because of the complex return type (together with cdotc
, zdotc
, and cdotu
). If the wrong one is used you get a segmentation fault.
I can't reproduce this myself yet but will dig a bit, it'll probably give us a hint what to look for.
@boegel Can you compile OpenBLAS (same version as above) with debug info and then run your reproducer with
export FLEXIBLAS=/path/to/libopenblas.so
That will shine some more light on the issue in the backtrace.
Maybe you can try installing
OpenBLAS-0.3.20-GCC-11.2.0.eb
, and then swapping
@boegel I see the problem also with foss/2022a, which uses OpenBLAS/0.3.20-GCC-11.3.0
, so it looks like it is present in the 0.3.20 version of OpenBLAS.
However, I tried to use IMKL as a backend (I am not sure I know what I am doing) and it crash in the same way. This could indicate that it is a FlexiBLAS issue, perhaps the calling convention issue that @bartoldeman is referring to.
export FLEXIBLAS=/home/modules/software/imkl/2022.1.0/mkl/latest/lib/intel64/libmkl_rt.so
(gdb) where
#0 0x00007ff662b30d2f in zdotu_ ()
from /home/modules/software/imkl/2022.1.0/mkl/latest/lib/intel64/libmkl_intel_lp64.so.2
#1 0x00007ff66e39b316 in flexiblas_real_cblas_zdotu_sub ()
from /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib/libflexiblas.so.3
#2 0x00007ff66e6c4cf4 in CDOUBLE_dot ()
from /home/modules/software/SciPy-bundle/2022.05-foss-2022a/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so
@schiotz you should not use libmkl_rt.so
as backend, that's for sure (long complicated story).
The way to use MKL as backend is to use export FLEXIBLAS=imkl
, this will look in the configuration file $EBROOTFLEXIBLAS/etc/flexiblasrc.d/imkl.conf
and then in the directory $FLEXIBLAS_LIBRARY_PATH
(set by the imkl module).
Can you also try with BLIS?
module load BLIS/0.9.0-GCC-11.3.0
export FLEXIBLAS=blis
export FLEXIBLAS_VERBOSE=1
(last one just to confirm it's using BLIS)
I can confirm it does not crash with IMKL or BLIS. Here is the output when running with BLIS:
15:41 [sylg] numpy-bug$ export FLEXIBLAS=blis
15:41 [sylg] numpy-bug$ export FLEXIBLAS_VERBOSE=1
15:41 [sylg] numpy-bug$ python bug.py
<flexiblas> Load system config /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/etc/flexiblasrc
<flexiblas> Load config: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/etc/flexiblasrc.d//BLIS.conf
<flexiblas> Load config: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/etc/flexiblasrc.d//NETLIB.conf
<flexiblas> Load config: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/etc/flexiblasrc.d//OpenBLAS.conf
<flexiblas> Load config: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/etc/flexiblasrc.d//imkl.conf
<flexiblas> Config /home/niflheim/schiotz/.flexiblasrc does not exist.
<flexiblas> Config /home/niflheim/schiotz/.flexiblasrc.sylg.fysik.dtu.dk does not exist.
<flexiblas> Environment supplied config ((null)) does not exist.
<flexiblas> libflexiblas.so is /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib/libflexiblas.so.3
<flexiblas> Hook "DUMMY/DUMMY" found in /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib64/flexiblas//libflexiblas_hook_dummy.so
<flexiblas> Hook "Profile/PROFILE" found in /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib64/flexiblas//libflexiblas_hook_profile.so
<flexiblas>
<flexiblas> FlexiBLAS, version 3.2.0
<flexiblas> Copyright (C) 2013-2021 Martin Koehler and others.
<flexiblas> This is free software; see the source code for copying conditions.
<flexiblas> There is ABSOLUTELY NO WARRANTY; not even for MERCHANTABILITY or
<flexiblas> FITNESS FOR A PARTICULAR PURPOSE.
<flexiblas>
<flexiblas> Check if shared library exist: /home/modules/software/imkl/2022.1.0/mkl/2022.1.0/lib/intel64/flexiblas/libflexiblas_netlib.so
<flexiblas> Check if shared library exist: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib64/flexiblas//libflexiblas_netlib.so
<flexiblas> Check if shared library exist: /home/modules/software/imkl/2022.1.0/mkl/2022.1.0/lib/intel64/flexiblas/libflexiblas_fallback_lapack.so
<flexiblas> Check if shared library exist: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib64/flexiblas//libflexiblas_fallback_lapack.so
<flexiblas> Trying to use the content of FLEXIBLAS: "blis" as shared library.
<flexiblas> Check if shared library exist: /home/modules/software/imkl/2022.1.0/mkl/2022.1.0/lib/intel64/flexiblas/blis
<flexiblas> Check if shared library exist: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib64/flexiblas//blis
<flexiblas> "BLIS" does not seem to a shared library. Search inside the FlexiBLAS configuration..
<flexiblas> Trying to load libflexiblas_blis.so
<flexiblas> Check if shared library exist: /home/modules/software/imkl/2022.1.0/mkl/2022.1.0/lib/intel64/flexiblas/libflexiblas_blis.so
<flexiblas> Check if shared library exist: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib64/flexiblas//libflexiblas_blis.so
<flexiblas> Set thread number function found ( func_name = bli_thread_set_num_threads ) at 0x7f4035698320
<flexiblas> Set thread number function found ( func_name = bli_thread_set_num_threads_ ) at 0x7f4035650500
<flexiblas> Get thread number function ( func_name = bli_thread_get_num_threads ) at 0x7f4035698290
<flexiblas> Available XERBLA ( backend: 0x7f4035696160, user defined: 0x7f4036598ef0, FlexiBLAS: 0x7f4036598ef0 )
<flexiblas> Use XERBLA of the BLAS backend.
<flexiblas> Available CBLAS_XERBLA ( backend: 0x7f403565aa20, user defined: 0x7f40365baea0, FlexiBLAS: 0x7f40365baea0 )
<flexiblas> Use XERBLA of the BLAS backend.
<flexiblas> The desired BLAS library is BLIS. We do not load their CBLAS wrapper since it might alter the behavior of your programs.<flexiblas> BLAS info:
<flexiblas> - intel_interface = 0
<flexiblas> - flexiblas_integer_size = 4
<flexiblas> - backend_integer_size = 4
<flexiblas> - post_init = 0
<flexiblas> cleanup
15:41 [sylg] numpy-bug$
@schiotz can you run the original (crashing) testcase with FLEXIBLAS_VERBOSE=1 as well?
16:37 [sylg] numpy-bug$ export FLEXIBLAS_VERBOSE=1
16:37 [sylg] numpy-bug$ python bug.py
<flexiblas> Load system config /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/etc/flexiblasrc
<flexiblas> Load config: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/etc/flexiblasrc.d//BLIS.conf
<flexiblas> Load config: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/etc/flexiblasrc.d//NETLIB.conf
<flexiblas> Load config: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/etc/flexiblasrc.d//OpenBLAS.conf
<flexiblas> Load config: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/etc/flexiblasrc.d//imkl.conf
<flexiblas> Config /home/niflheim/schiotz/.flexiblasrc does not exist.
<flexiblas> Config /home/niflheim/schiotz/.flexiblasrc.sylg.fysik.dtu.dk does not exist.
<flexiblas> Environment supplied config ((null)) does not exist.
<flexiblas> libflexiblas.so is /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib/libflexiblas.so.3
<flexiblas> Hook "DUMMY/DUMMY" found in /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib64/flexiblas//libflexiblas_hook_dummy.so
<flexiblas> Hook "Profile/PROFILE" found in /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib64/flexiblas//libflexiblas_hook_profile.so
<flexiblas>
<flexiblas> FlexiBLAS, version 3.2.0
<flexiblas> Copyright (C) 2013-2021 Martin Koehler and others.
<flexiblas> This is free software; see the source code for copying conditions.
<flexiblas> There is ABSOLUTELY NO WARRANTY; not even for MERCHANTABILITY or
<flexiblas> FITNESS FOR A PARTICULAR PURPOSE.
<flexiblas>
<flexiblas> Check if shared library exist: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib64/flexiblas//libflexiblas_netlib.so
<flexiblas> Check if shared library exist: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib64/flexiblas//libflexiblas_fallback_lapack.so
<flexiblas> Use default BLAS: OPENBLAS - libflexiblas_openblas.so from System Directory
<flexiblas> Check if shared library exist: /home/modules/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib64/flexiblas//libflexiblas_openblas.so
<flexiblas> Set thread number function found ( func_name = openblas_set_num_threads ) at 0x7f6cc6d05160
<flexiblas> Set thread number function found ( func_name = openblas_set_num_threads_ ) at 0x7f6cc6d04a60
<flexiblas> Get thread number function ( func_name = openblas_get_num_threads ) at 0x7f6cc6d043c0
<flexiblas> Get thread number function ( func_name = openblas_get_num_threads_ ) at 0x7f6cc6d04a70
<flexiblas> Available XERBLA ( backend: 0x7f6cc6d04990, user defined: 0x7f6cc86c8ef0, FlexiBLAS: 0x7f6cc86c8ef0 )
<flexiblas> Use XERBLA of the BLAS backend.
<flexiblas> Available CBLAS_XERBLA ( backend: 0x7f6cc6b19470, user defined: 0x7f6cc86eaea0, FlexiBLAS: 0x7f6cc86eaea0 )
<flexiblas> Use XERBLA of the BLAS backend.
<flexiblas> BLAS info:
<flexiblas> - intel_interface = 0
<flexiblas> - flexiblas_integer_size = 4
<flexiblas> - backend_integer_size = 4
<flexiblas> - post_init = 0
Segmentation fault (core dumped)
16:38 [sylg] numpy-bug$
@boegel Can you compile OpenBLAS (same version as above) with debug info and then run your reproducer with
export FLEXIBLAS=/path/to/libopenblas.so
That will shine some more light on the issue in the backtrace.
Here's (the relevant part of) the GDB backtrace with OpenBLAS/0.3.20-GCC-11.2.0
built with debug symbols (enabled via debug
toolchain option):
Program received signal SIGSEGV, Segmentation fault.
0x00001555512cbd6a in zdot_compute (n=n@entry=28, x=<optimized out>, inc_x=2, inc_x@entry=1, y=0x155545438300, inc_y=<optimized out>, result=result@entry=0x7ffffffe9d50)
at /tmp/vsc40023/easybuild_build/OpenBLAS/0.3.20/GCC-11.2.0/OpenBLAS-0.3.20/kernel/zdot_microk_haswell-2.c:148
148 /tmp/vsc40023/easybuild_build/OpenBLAS/0.3.20/GCC-11.2.0/OpenBLAS-0.3.20/kernel/zdot_microk_haswell-2.c: No such file or directory.
(gdb) bt
#0 0x00001555512cbd6a in zdot_compute (n=n@entry=28, x=<optimized out>, inc_x=2, inc_x@entry=1, y=0x155545438300, inc_y=<optimized out>, result=result@entry=0x7ffffffe9d50)
at /tmp/vsc40023/easybuild_build/OpenBLAS/0.3.20/GCC-11.2.0/OpenBLAS-0.3.20/kernel/zdot_microk_haswell-2.c:148
#1 0x00001555512cbe8e in zdotu_k (n=28, x=<optimized out>, inc_x=1, y=<optimized out>, inc_y=<optimized out>)
at /tmp/vsc40023/easybuild_build/OpenBLAS/0.3.20/GCC-11.2.0/OpenBLAS-0.3.20/kernel/zdot_microk_haswell-2.c:204
#2 0x0000155552df5e06 in flexiblas_real_cblas_zdotu_sub () from /user/gent/400/vsc40023/eb_arcaninescratch/RHEL8/haswell-ib/software/FlexiBLAS/3.0.4-GCC-11.2.0/lib/libflexiblas.so.3
#3 0x0000155553116340 in CDOUBLE_dot (ip1=0x86b5e0 "", is1=16, ip2=0x155545438300 "", is2=1632960, op=0x1555427ee760 "", n=28, __NPY_UNUSED_TAGGEDignore=0x0)
at build/src.linux-x86_64-3.9/numpy/core/src/multiarray/arraytypes.c:3628
#4 0x00001555531f2928 in PyArray_MatrixProduct2 (op1=<optimized out>, op2=<optimized out>, out=<optimized out>) at numpy/core/src/npymath/funcs.inc.src:1083
#5 0x00001555531f3003 in array_matrixproduct (__NPY_UNUSED_TAGGEDdummy=<optimized out>, args=<optimized out>, kwds=<optimized out>) at numpy/core/src/npymath/funcs.inc.src:2438
@bartoldeman wrote:
zdotu
is one of the 4 functions affected by different Fortran calling conventions depending on the compiler, because of the complex return type (together withcdotc
,zdotc
, andcdotu
). If the wrong one is used you get a segmentation fault.
It certainly only occurs with complex arrays, but it is also important that the axes of the second array are swapped, so it must somehow be related to the array being non-contiguous.
Is there anything I can do to help making progress on this? Can I test something to see if it is due to how easybuild builds it or a bug in OpenBLAS? If the latter, I guess it should be reported upstream.
I'm working on reproducing this, I suspect it's indeed something upstream in the assembly language kernel of zdot, but will isolate a bit further.
@boegel I still can't reproduce this, perhaps it's fixed with the patch to GCC? @schiotz have you recompiled GCCcore with the new patch (included in the new easybuild 4.6.2)?
I'll try this. I assume I have to rebuild GCCcore, then OpenBLAS and try again. I'll have to be careful to use the modules I built myself and not the ones on the system, but I can use FLEXIBLAS_VERBOSE to see which library it pick up. The tricky thing may be to check that EasyBuild uses the right GCCcore module itself, but I should be able to see that from the full paths shown by ps while compiling.
I'll report back once the builds are finished.
@bartoldeman Unfortunately, recompiling GCCcore and then both OpenBLAS and FlexiBLAS did not change anything.
Regarding reproducibility: We have four different login nodes on our cluster, with four slightly different architectures. I only see the bug on three of them.
Edit Affected:
Not affected:
I've been able to reproduce it now, so I can debug the issue.
Checking the assembly language there's another compiler vectorization bug :(, where the loop in kernel/x86_64/zdot.c
while(i < n)
{
dot[0] += x[ix] * y[iy] ;
dot[1] += x[ix+1] * y[iy+1] ;
dot[2] += x[ix] * y[iy+1] ;
dot[3] += x[ix+1] * y[iy] ;
ix += inc_x ;
iy += inc_y ;
i++ ;
}
gets compiled (if I understand it well!) as the equivalent of
while(i < n)
{
dot[0] += x[ix] * y[iy] ;
dot[1] += x[ix+inc_x] * y[iy+inc_y] ;
dot[2] += x[ix] * y[iy+inc_y] ;
dot[3] += x[ix+inc_x] * y[iy] ;
ix += inc_x ;
iy += inc_y ;
i++ ;
}
(EDIT:NO, this isn't the case, it does load y[iy+inc_y]
but discards it after!)
for the final loop iteration y[iy+inc_y]
, where inc_y
is a fairly large value (27*27*70*2*2=204120
) may not be readable, and so causes the segfault. Sometimes it is readable, and won't crash, but will still produce the wrong result!
I'm checking if the fix to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212 fixes this. OpenBLAS put in a workaround for GCC 12 already on Windows and Mac OS X only, but since we compile with -ftree-vectorize, it's also in GCC 11.
Adding
toolchainopts = {'vectorize': False}
to the OpenBLAS easyconfigs for GCC11+ should fix this for now...
Still bad to have another GCC issue as it could affect other code. The fix above didn't solve the issue. Will try with a snapshot, then see if it's another GCC issue with a small test case if that fails too.
GCC bug report here https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451
In the end the bug cannot produce wrong results, but can produce a segmentation fault if you are unlucky and y[inc_y*n]
isn't accessible.
Thank you very much, @bartoldeman for looking into this. Is it a workaround to use toolchainopts = {'vectorize': False}
(which sounds like it would hurt performance), or is there something else that can be done a part from giving up on foss/2022a (we have the crash twenty-something times in the extended test suite for GPAW)?
@schiotz please see #16510 for a workaround that is less of a hammer than turning off vectorization everywhere.
Thank you very much indeed, @bartoldeman
I can confirm that this appears to fix our problems, at least for the test case. We are now rebuilding on all platforms, and testing our code. I would expect it to work now.
@schiotz Can this issue be closed?
I am not 100% sure, we seem to still have some issues and are trying to figure out if it is related to this problem, or something else.
After rebuilding GCCcore and OpenBLAS, we are seeing lots of segfaults in OpenMPI that we did not see before. I cannot in any way imagine how fixing a vectorization bug could cause that, but perhaps something else changed as well. I'll continue to investigate.
@bartoldeman @boegel Apparently after recompiling GCCcore, OpenBLAS and Flexiblas due to this issue and #16510 our GPAW jobs would almost always crash in OpenMPI. Recompiling OpenMPI fixes it. It makes no sense that fixing a vectorization bug could have that effect, but perhaps something else had also been changing in GCCcore leading to some kind of incompatibility between code compiled with the old and new version. Does that even make sense? Should we just recompile everything made with the 2022a toolchains?
It is a bit strange, since OpenBLAS/FlexiBLAS do not interact with Open MPI. It's possible something in the Open MPI easyconfig was changed between compilations as well. In any case, I wouldn't worry about it. Recompiling everything would be prudent, some others have done it too.
I'll close this one though, as the segfault is fixed.
Hi EasyBuilders,
We have problems with a core-dump inside our GPAW code, from within FlexiBLAS:
We can reproduce the bug with this four-line code snippet (pure numpy code) on most (but not all) our machines:
We see the problem with
SciPy-bundle/2021.10-foss-2021b
andSciPy-bundle/2022.05-foss-2022a
, but not with the corresponding intel-toolchain packages. Nor do we see the problem withSciPy-bundle/2020.11-foss-2020b
which does not use FlexiBlas (I think...).CC: @jjmortensen