triggers segfault in scipy arpack test (test_hermitian_modes) on i386

drew-parsons commented 2 years ago

This issue has been reported in the Debian build of blis and scipy, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1003877

Debian runs CI testing of packages. The test for blis and scipy are logged at https://ci.debian.net/packages/b/blis/ and https://ci.debian.net/packages/s/scipy/

Debian has set up BLAS packages so that client applications link against libblas.so.3, which can be provided by any BLAS implementation (BLIS, OpenBLAS, Atlas, etc).

The scipy CI tests on i386 architecture (https://ci.debian.net/packages/s/scipy/testing/i386/ or https://ci.debian.net/packages/s/scipy/unstable/i386/ ) pass when running with OpenBLAS or Atlas.

BLIS, however, triggers a segfault via scipy's arpack (test_hermitian_modes), in the sparse submodule. An excerpt from the test log is

linalg/eigen/arpack/tests/test_arpack.py::test_symmetric_modes PASSED    [  1%]
linalg/eigen/arpack/tests/test_arpack.py::test_hermitian_modes Fatal Python error: Segmentation fault

Current thread 0xf7c0bc80 (most recent call first):
  File "/usr/lib/python3/dist-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 879 in extract
  File "/usr/lib/python3/dist-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1348 in eigs
  File "/usr/lib/python3/dist-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1567 in eigsh
  File "/usr/lib/python3/dist-packages/scipy/sparse/linalg/eigen/arpack/tests/test_arpack.py", line 248 in eval_evec
  File "/usr/lib/python3/dist-packages/scipy/sparse/linalg/eigen/arpack/tests/test_arpack.py", line 422 in test_hermitian_modes
  File "/usr/lib/python3/dist-packages/_pytest/python.py", line 183 in pytest_pyfunc_call
  File "/usr/lib/python3/dist-packages/pluggy/callers.py", line 187 in _multicall
  File "/usr/lib/python3/dist-packages/pluggy/manager.py", line 83 in <lambda>
  File "/usr/lib/python3/dist-packages/pluggy/manager.py", line 92 in _hookexec
  File "/usr/lib/python3/dist-packages/pluggy/hooks.py", line 286 in __call__
  File "/usr/lib/python3/dist-packages/_pytest/python.py", line 1641 in runtest
  File "/usr/lib/python3/dist-packages/_pytest/runner.py", line 162 in pytest_runtest_call
  File "/usr/lib/python3/dist-packages/pluggy/callers.py", line 187 in _multicall
  File "/usr/lib/python3/dist-packages/pluggy/manager.py", line 83 in <lambda>
  File "/usr/lib/python3/dist-packages/pluggy/manager.py", line 92 in _hookexec
  File "/usr/lib/python3/dist-packages/pluggy/hooks.py", line 286 in __call__
  File "/usr/lib/python3/dist-packages/_pytest/runner.py", line 255 in <lambda>
  File "/usr/lib/python3/dist-packages/_pytest/runner.py", line 311 in from_call
  File "/usr/lib/python3/dist-packages/_pytest/runner.py", line 254 in call_runtest_hook
  File "/usr/lib/python3/dist-packages/_pytest/runner.py", line 215 in call_and_report
  File "/usr/lib/python3/dist-packages/_pytest/runner.py", line 126 in runtestprotocol
  File "/usr/lib/python3/dist-packages/_pytest/runner.py", line 109 in pytest_runtest_protocol
  File "/usr/lib/python3/dist-packages/pluggy/callers.py", line 187 in _multicall
  File "/usr/lib/python3/dist-packages/pluggy/manager.py", line 83 in <lambda>
  File "/usr/lib/python3/dist-packages/pluggy/manager.py", line 92 in _hookexec
  File "/usr/lib/python3/dist-packages/pluggy/hooks.py", line 286 in __call__
  File "/usr/lib/python3/dist-packages/_pytest/main.py", line 348 in pytest_runtestloop
  File "/usr/lib/python3/dist-packages/pluggy/callers.py", line 187 in _multicall
  File "/usr/lib/python3/dist-packages/pluggy/manager.py", line 83 in <lambda>
  File "/usr/lib/python3/dist-packages/pluggy/manager.py", line 92 in _hookexec
  File "/usr/lib/python3/dist-packages/pluggy/hooks.py", line 286 in __call__
  File "/usr/lib/python3/dist-packages/_pytest/main.py", line 323 in _main
  File "/usr/lib/python3/dist-packages/_pytest/main.py", line 269 in wrap_session
  File "/usr/lib/python3/dist-packages/_pytest/main.py", line 316 in pytest_cmdline_main
  File "/usr/lib/python3/dist-packages/pluggy/callers.py", line 187 in _multicall
  File "/usr/lib/python3/dist-packages/pluggy/manager.py", line 83 in <lambda>
  File "/usr/lib/python3/dist-packages/pluggy/manager.py", line 92 in _hookexec
  File "/usr/lib/python3/dist-packages/pluggy/hooks.py", line 286 in __call__
  File "/usr/lib/python3/dist-packages/_pytest/config/__init__.py", line 162 in main
  File "/usr/lib/python3/dist-packages/scipy/_lib/_testutils.py", line 67 in __call__
  File "/tmp/autopkgtest-lxc.nl0r4r9g/downtmp/autopkgtest_tmp/runtest.py", line 22 in <module>
Segmentation fault
autopkgtest [13:11:32]: test blis: -----------------------]
autopkgtest [13:11:33]: test blis:  - - - - - - - - - - results - - - - - - - - - -
blis                 FAIL non-zero exit status 139

line 879 in arpack.py refers to unpacking with complex numbers:

        else:
            # complex is so much simpler...
            d, z, ierr =\
                    self._arpack_extract(return_eigenvectors,
                           howmny, sselect, self.sigma, workev,
                           self.bmat, self.which, k, self.tol, self.resid,
                           self.v, self.iparam, self.ipntr,
                           self.workd, self.workl, self.rwork, ierr)

If I read scipy correctly, self._arpack_extract is defined as Fneupd or Dneupd , the prefix type F or D being set at test_arpack.py#L426/test_arpack.py#L426

Reporting here since the same scipy test is passing for OpenBLAS and Atlas.

devinamatthews commented 2 years ago

@drew-parsons can you please extract the BLAS/LAPACK operation and parameters this corresponds to and construct a minimal reproducer (Fortran or C/CBLAS)?

drew-parsons commented 2 years ago

Here is an indicative test case. Compiles with gfortran test_blis.f90 -larpack -o test_blis

(edit: attaching as "text file" for ease of access) test_blis.f90.txt

test_blis.f90

      program testblis
      logical :: return_eigenvectors
      character :: howmny
      integer, dimension(6) :: sselect
      real :: sigma
      complex, dimension(18) :: workev
      character :: bmat
      character, dimension(2) :: which
      integer :: k
      real :: tol
      complex, dimension(6) :: resid
      complex, dimension(6,6) :: v
      integer, dimension(11) :: iparam
      integer, dimension(14) :: ipntr
      complex, dimension(18) :: workd
      complex, dimension(144) :: workl
      real, dimension(6) :: rwork
      integer :: ierr

      return_eigenvectors = .true.
      howmny= 'A'
      sselect= (/ 0, 0, 0, 0, 0, 0 /)
      sigma= 0
      workev= (/ cmplx(0.0,+0.0), cmplx(0.0,+0.0), cmplx(0.0,+0.0), cmplx(0.0,+0.0), cmplx(0.0,+0.0), cmplx(0.0,+0.0), &
           cmplx(0.0,+0.0), cmplx(0.0,+0.0), cmplx(0.0,+0.0), cmplx(0.0,+0.0),  & 
           cmplx(0.0,+0.0), cmplx(0.0,+0.0), cmplx(0.0,+0.0), cmplx(0.0,+0.0), cmplx(0.0,+0.0), cmplx(0.0,+0.0), &
           cmplx(0.0,+0.0), cmplx(0.0,+0.0) /)
      bmat= 'I'
      which= 'LM'
      k= 2
      tol= 5.960464477539063e-08
      resid= [ cmplx(0.0,+0.0), cmplx(0.0,+0.0), cmplx(0.0,+0.0), cmplx(0.0,+0.0), cmplx(0.0,+0.0), &
           cmplx(0.0,+0.0) ]

      v= reshape( [ cmplx(0.19250575,+4.20763885e-04),  cmplx(0.39373642,-2.34598592e-01), &
      cmplx(-0.41836268,+5.14127433e-01),  cmplx(0.20587479,-1.73068494e-01),                         &
      cmplx(0.08215308,+2.73760617e-01), cmplx(-0.3820484 ,-1.16318196e-01),                        &
      cmplx(-0.44885007,-4.45856780e-01),  cmplx(0.50837225,+1.17746256e-01),                         &
      cmplx(0.0339167 ,+5.68150468e-02),  cmplx(0.15588735,+3.97075832e-01),                   &
      cmplx(0.2852069 ,-1.76007375e-01),  cmplx(0.05474113,+1.60537273e-01),                  &
      cmplx(0.05720383,-1.45750791e-01),  cmplx(0.4397847 ,+7.28189573e-03),                   &
      cmplx(-0.05033582,+1.12324260e-01), cmplx(-0.4542598 ,-4.70747769e-01),                            &
      cmplx(-0.19540797,-1.21156394e-01),  cmplx(0.53374416,-3.41907293e-02),                        &
      cmplx(0.43965942,-1.21829905e-01),  cmplx(0.10286107,-1.07869089e-01),                   &
      cmplx(0.29702908,-1.74095705e-01),  cmplx(0.3365949 ,-8.47186148e-02),                   &
      cmplx(0.46479693,-1.13499127e-01),  cmplx(0.2250337 ,-5.01072407e-01),                  &
      cmplx(0.16161908,-1.41184554e-01),  cmplx(0.37373623,-2.95559037e-02),                   &
      cmplx(0.5218592 ,-2.57394373e-01),  cmplx(0.18750091,-3.98230515e-02),                   &
      cmplx(-0.5168995 ,+3.04058760e-01), cmplx(-0.24739297,+1.31233916e-01),                           &
      cmplx(0.3173374 ,+4.28102046e-01),  cmplx(0.375448,  -1.42951623e-01),                   &
      cmplx(-0.02793682,-2.88464576e-01), cmplx(-0.35166356,+1.91974759e-01),                            &
      cmplx(0.23884559,-3.27900827e-01), cmplx(-0.26588023,+2.73599952e-01) ], [ 6,6 ] )

      iparam= (/ 1, 0, 1, 1, 2, 0, 1, 0, 6, 0, 5 /)
      ipntr= (/ 13,   7,   1, 139  , 1,  37 , 49,  43,   0,   0,   0,   0,   0,  85 /)

      workd= [ cmplx(-1.1470173e-22,-6.98174655e-23),  cmplx(9.9702622e-23,+5.91705001e-24), &
      cmplx(-3.2656615e-22,-1.11926728e-22),  cmplx(2.4716908e-22,-2.14525690e-23),                &
      cmplx(2.1190568e-22,+9.29814572e-23),  cmplx(7.4992352e-25,-3.10411314e-22),                 &
      cmplx(4.5107125e-17,-7.34336590e-16),  cmplx(8.5371810e-16,-6.49521222e-16),                 &
      cmplx(1.7819955e-15,+7.97835897e-16), cmplx(-1.8529566e-15,+6.22655508e-16),                 &
      cmplx(-2.8604657e-16,+8.16474315e-17),  cmplx(4.2566341e-15,-2.35448398e-16),                &
      cmplx(-3.8204840e-01,-1.16318196e-01),  cmplx(5.4741126e-02,+1.60537273e-01),                &
      cmplx(5.3374416e-01,-3.41907293e-02),  cmplx(2.2503370e-01,-5.01072407e-01),                 &
      cmplx(-2.4739297e-01,+1.31233916e-01), cmplx(-2.6588023e-01,+2.73599952e-01) ]

      workl= [ cmplx(3.5884588e+00,+5.96068546e-08),  cmplx(6.6824436e+00,+0.00000000e+00),   &
      cmplx(0.0000000e+00,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(0.0000000e+00,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(6.6824427e+00,+7.22271238e-08),  cmplx(1.7989706e+01,+1.75612591e-09),    &
      cmplx(2.7993238e+00,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(0.0000000e+00,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(1.9762646e-07,+5.67932226e-08),  cmplx(2.7993245e+00,+1.67374694e-07),    &
      cmplx(1.7962735e+00,+1.49791646e-09),  cmplx(4.4582537e-01,+0.00000000e+00),    &
      cmplx(0.0000000e+00,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(4.9505530e-08,+7.81608804e-08),  cmplx(3.3781302e-07,+2.48284550e-07),    &
      cmplx(4.4582552e-01,+2.52960426e-08),  cmplx(1.1871042e+00,+3.34558052e-08),    &
      cmplx(8.0961734e-01,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(-2.9630577e-08,-4.97465251e-08), cmplx(-5.9703831e-07,+7.43493160e-08),    &
      cmplx(-3.0276016e-08,+1.37710188e-07),  cmplx(8.0961734e-01,+2.40304949e-08),    &
      cmplx(1.2858521e+00,-2.46285836e-08),  cmplx(1.5494964e-01,+0.00000000e+00),    &
      cmplx(1.6944882e-07,+3.90373884e-08),  cmplx(1.1771689e-07,+8.37097858e-08),    &
      cmplx(-3.1870591e-08,+2.48743914e-09), cmplx(-7.4845516e-09,-4.62256651e-08),    &
      cmplx(1.5494964e-01,+7.77471865e-09),  cmplx(1.1986541e-01,+2.64537792e-09),    &
      cmplx(2.4328265e+00,+4.93143375e-08),  cmplx(2.0968050e+01,+5.72654386e-08),    &
      cmplx(1.9191172e+00,-4.35357350e-08),  cmplx(5.2524900e-01,+3.72636677e-09),    &
      cmplx(1.1075074e-01,+4.62970462e-09),  cmplx(1.1263161e-02,+2.93336289e-09),    &
      cmplx(0.0000000e+00,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(0.0000000e+00,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(0.0000000e+00,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(-3.5559237e-01,-1.00254693e-08), cmplx(-9.2481941e-01,-1.47705999e-08),    &
      cmplx(-1.3510637e-01,-1.76946635e-09), cmplx(-3.0501792e-03,-3.63992413e-11),    &
      cmplx(-1.2547492e-04,-9.75592328e-13), cmplx(-9.3256529e-07,-4.80765313e-15),    &
      cmplx(-4.7526979e-01,-1.13980704e-07),  cmplx(8.2191058e-02,+1.07808811e-08),    &
      cmplx(6.7778063e-01,+1.23798515e-07),  cmplx(4.5166564e-01,+5.66356455e-08),    &
      cmplx(3.2172996e-01,+1.94998417e-08),  cmplx(2.1553304e-02,+8.71409056e-10),    &
      cmplx(-3.8905147e-01,+1.20273995e-07),  cmplx(9.7188964e-02,-2.69054343e-08),    &
      cmplx(3.7077886e-01,-9.19978476e-08), cmplx(-5.0808138e-01,+3.40248022e-07),    &
      cmplx(-6.6355413e-01,+4.23044185e-07), cmplx(-5.7144579e-02,+3.49655771e-08),    &
      cmplx(-3.9543414e-01,+1.26818341e-05),  cmplx(1.8126571e-01,-5.80018104e-06),    &
      cmplx(-1.8691684e-01,+5.95490610e-06), cmplx(-6.0527158e-01,+1.92895259e-05),    &
      cmplx(5.9773225e-01,-1.90306291e-05),  cmplx(2.2847104e-01,-7.27468205e-06),    &
      cmplx(-4.7724560e-01,+2.85578790e-06),  cmplx(2.5547558e-01,-1.53281132e-06),    &
      cmplx(-5.0150907e-01,+3.01640694e-06),  cmplx(4.0383488e-01,-2.38385019e-06),    &
      cmplx(-3.1034443e-01,+1.82041344e-06),  cmplx(4.4278786e-01,-2.59612420e-06),    &
      cmplx(3.3493257e-01,+3.76476964e-06), cmplx(-1.7430720e-01,-1.96717610e-06),    &
      cmplx(3.1374088e-01,+3.54168446e-06), cmplx(-9.1684043e-02,-1.01984097e-06),    &
      cmplx(-5.0874900e-02,-5.40278279e-07),  cmplx(8.6487550e-01,+9.37304503e-06),    &
      cmplx(2.0968050e+01,+5.72654386e-08),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(0.0000000e+00,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(0.0000000e+00,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(-2.3175740e-07,-2.09900861e-07),  cmplx(2.4328265e+00,+4.93143375e-08),    &
      cmplx(0.0000000e+00,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(0.0000000e+00,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(-2.7656853e-07,+1.59539354e-07),  cmplx(3.1791387e-07,-6.12752444e-08),    &
      cmplx(1.9191172e+00,-4.35357350e-08),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(0.0000000e+00,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(7.9274611e-07,+1.65003257e-07), cmplx(-1.3937333e-07,+6.95500688e-08),    &
      cmplx(-2.5741244e-08,+9.25560641e-08),  cmplx(5.2524900e-01,+3.72636677e-09),    &
      cmplx(0.0000000e+00,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(1.2935055e-07,+6.24127949e-09),  cmplx(1.1180424e-07,-4.74945727e-09),    &
      cmplx(-1.2418276e-07,-1.24757795e-08), cmplx(-5.9591937e-08,-6.58801014e-09),    &
      cmplx(1.1263161e-02,+2.93336289e-09),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(-4.7867275e-07,-1.36812275e-07), cmplx(-1.4066131e-07,-5.51127428e-08),    &
      cmplx(1.9988818e-08,-8.53366977e-09), cmplx(-4.5463906e-08,+4.40631531e-08),    &
      cmplx(-3.8705249e-08,-2.18920206e-08),  cmplx(1.1075074e-01,+4.62970462e-09),    &
      cmplx(2.0968050e+01,+5.72654386e-08),  cmplx(2.4328265e+00,+4.93143375e-08),    &
      cmplx(1.9191172e+00,-4.35357350e-08),  cmplx(5.2524900e-01,+3.72636677e-09),    &
      cmplx(1.1263161e-02,+2.93336289e-09),  cmplx(1.1075074e-01,+4.62970462e-09),    &
      cmplx(0.0000000e+00,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(0.0000000e+00,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(0.0000000e+00,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(0.0000000e+00,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(0.0000000e+00,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(0.0000000e+00,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(0.0000000e+00,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(0.0000000e+00,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00),    &
      cmplx(0.0000000e+00,+0.00000000e+00),  cmplx(0.0000000e+00,+0.00000000e+00) ]

      rwork= [ 0.0000000e+00, 4.4165824e-07, 8.1529697e-07, 1.2849700e-06, 4.5498405e-07,  9.8990586e-07 ]
      ierr= 0

      call cneupd(return_eigenvectors,                        &
                           howmny, sselect, sigma, workev,    &
                           bmat,  which, k, tol, resid,       &
                           v,     iparam,   ipntr,            &
                           workd, workl,    rwork, ierr)

      end program testblis

I don't use fortran routinely so there may be some implementation error here. It's adapted from python output, printing out the arguments used when scipy calls cneupd from python. I'll provide the python test code used to generate it in the next comment.

This sample code does segfault, but it also segfaults on amd64 where there ought to be no problem. It segfaults in cneupd_ in libarpack.so, so I think this sample code will need further editing to hit BLIS.

The arpack linked in this case is arpack-ng 3.8.0 (scipy uses a vendored copy of arpack, 3.3.0).

drew-parsons commented 2 years ago

I got the parameters for the test case by hacking scipy's arpack.py and test_arpack.py, running as python3 test_arpack_min.py

test_arpack_min.py (edit: attaching as "text file". This bug is tidier without the explicit python code taking up eye space) test_arpack_min.py.txt

and printing parameters in arpack_debug.py is a local copy of arpack.py edited to print parameters: (attaching as "text file") arpack_debug.py.txt

$ diff -u /usr/lib/python3/dist-packages/scipy/sparse/linalg/eigen/arpack/arpack.py arpack_debug.py 
--- /usr/lib/python3/dist-packages/scipy/sparse/linalg/eigen/arpack/arpack.py   2022-01-09 14:11:52.000000000 +0100
+++ arpack_debug.py 2022-01-17 16:52:17.234044410 +0100
@@ -39,7 +39,7 @@

 __all__ = ['eigs', 'eigsh', 'svds', 'ArpackError', 'ArpackNoConvergence']

-from . import _arpack
+from scipy.sparse.linalg.eigen.arpack import _arpack
 arpack_int = _arpack.timing.nbx.dtype

 import numpy as np
@@ -704,6 +704,7 @@

         ltr = _type_conv[self.tp]
         self._arpack_solver = _arpack.__dict__[ltr + 'naupd']
+        print("_arpack_extract=",ltr + 'neupd')
         self._arpack_extract = _arpack.__dict__[ltr + 'neupd']

         self.iterate_infodict = _NAUPD_ERRORS[ltr]
@@ -767,6 +768,7 @@
         sigmai = np.imag(self.sigma)
         workev = np.zeros(3 * self.ncv, self.tp)

+        print("extrac tp",self.tp)
         if self.tp in 'fd':
             dr = np.zeros(k + 1, self.tp)
             di = np.zeros(k + 1, self.tp)
@@ -875,6 +877,24 @@
                 z = z[:, ind]
         else:
             # complex is so much simpler...
+            print("complex _arpack_extract", self._arpack_extract)
+            print("return_eigenvectors=",return_eigenvectors)
+            print("howmny=",howmny)
+            print("sselect=",sselect)
+            print("self.sigma=",self.sigma)
+            print("workev=",workev)
+            print("self.bmat=",self.bmat)
+            print("self.which=",self.which)
+            print("k=",k)
+            print("self.tol=",self.tol)
+            print("self.resid=",self.resid)
+            print("self.v=",self.v)
+            print("self.iparam=",self.iparam)
+            print("self.ipntr=",self.ipntr)
+            print("self.workd=",self.workd)
+            print("self.workl=",self.workl)
+            print("self.rwork=",self.rwork)
+            print("ierr=",ierr)
             d, z, ierr =\
                     self._arpack_extract(return_eigenvectors,
                            howmny, sselect, self.sigma, workev,

drew-parsons commented 2 years ago

The parameters printed by arpack_debug.py were (in python format)

test_arpack_min_i386.txt

These are the parameters used in arpack.py to call

self._arpack_extract(return_eigenvectors,
                           howmny, sselect, self.sigma, workev,
                           self.bmat, self.which, k, self.tol, self.resid,
                           self.v, self.iparam, self.ipntr,
                           self.workd, self.workl, self.rwork, ierr)

(the point at which the segfault occurs in the scipy test) where self._arpack_extract is cneupd

devinamatthews commented 2 years ago

There's definitely a mismatch in the Fortran arguments. Here is the docs if you want to take a stab a fixing it (I can't take a look until at least Friday):

c\Usage:
c  call cneupd 
c     ( RVEC, HOWMNY, SELECT, D, Z, LDZ, SIGMA, WORKEV, BMAT, 
c       N, WHICH, NEV, TOL, RESID, NCV, V, LDV, IPARAM, IPNTR, WORKD, 
c       WORKL, LWORKL, RWORK, INFO )
c
c\Arguments:
c  RVEC    LOGICAL  (INPUT)
c          Specifies whether a basis for the invariant subspace corresponding
c          to the converged Ritz value approximations for the eigenproblem 
c          A*z = lambda*B*z is computed.
c
c             RVEC = .FALSE.     Compute Ritz values only.
c
c             RVEC = .TRUE.      Compute Ritz vectors or Schur vectors.
c                                See Remarks below.
c
c  HOWMNY  Character*1  (INPUT)
c          Specifies the form of the basis for the invariant subspace 
c          corresponding to the converged Ritz values that is to be computed.
c
c          = 'A': Compute NEV Ritz vectors;
c          = 'P': Compute NEV Schur vectors;
c          = 'S': compute some of the Ritz vectors, specified
c                 by the logical array SELECT.
c
c  SELECT  Logical array of dimension NCV.  (INPUT)
c          If HOWMNY = 'S', SELECT specifies the Ritz vectors to be
c          computed. To select the  Ritz vector corresponding to a
c          Ritz value D(j), SELECT(j) must be set to .TRUE.. 
c          If HOWMNY = 'A' or 'P', SELECT need not be initialized 
c          but it is used as internal workspace.
c
c  D       Complex array of dimension NEV+1.  (OUTPUT)
c          On exit, D contains the  Ritz  approximations 
c          to the eigenvalues lambda for A*z = lambda*B*z.
c
c  Z       Complex N by NEV array    (OUTPUT)
c          On exit, if RVEC = .TRUE. and HOWMNY = 'A', then the columns of 
c          Z represents approximate eigenvectors (Ritz vectors) corresponding 
c          to the NCONV=IPARAM(5) Ritz values for eigensystem
c          A*z = lambda*B*z.
c
c          If RVEC = .FALSE. or HOWMNY = 'P', then Z is NOT REFERENCED.
c
c          NOTE: If if RVEC = .TRUE. and a Schur basis is not required, 
c          the array Z may be set equal to first NEV+1 columns of the Arnoldi 
c          basis array V computed by CNAUPD.  In this case the Arnoldi basis 
c          will be destroyed and overwritten with the eigenvector basis.
c
c  LDZ     Integer.  (INPUT)
c          The leading dimension of the array Z.  If Ritz vectors are
c          desired, then  LDZ .ge.  max( 1, N ) is required.  
c          In any case,  LDZ .ge. 1 is required.
c
c  SIGMA   Complex  (INPUT)
c          If IPARAM(7) = 3 then SIGMA represents the shift. 
c          Not referenced if IPARAM(7) = 1 or 2.
c
c  WORKEV  Complex work array of dimension 2*NCV.  (WORKSPACE)
c
c  **** The remaining arguments MUST be the same as for the   ****
c  **** call to CNAUPD that was just completed.               ****
c
c  NOTE: The remaining arguments 
c
c           BMAT, N, WHICH, NEV, TOL, RESID, NCV, V, LDV, IPARAM, IPNTR, 
c           WORKD, WORKL, LWORKL, RWORK, INFO 
c
c         must be passed directly to CNEUPD following the last call 
c         to CNAUPD.  These arguments MUST NOT BE MODIFIED between
c         the the last call to CNAUPD and the call to CNEUPD.
c
c  Three of these parameters (V, WORKL and INFO) are also output parameters:
c
c  V       Complex N by NCV array.  (INPUT/OUTPUT)
c
c          Upon INPUT: the NCV columns of V contain the Arnoldi basis
c                      vectors for OP as constructed by CNAUPD .
c
c          Upon OUTPUT: If RVEC = .TRUE. the first NCONV=IPARAM(5) columns
c                       contain approximate Schur vectors that span the
c                       desired invariant subspace.
c
c          NOTE: If the array Z has been set equal to first NEV+1 columns
c          of the array V and RVEC=.TRUE. and HOWMNY= 'A', then the
c          Arnoldi basis held by V has been overwritten by the desired
c          Ritz vectors.  If a separate array Z has been passed then
c          the first NCONV=IPARAM(5) columns of V will contain approximate
c          Schur vectors that span the desired invariant subspace.
c
c  WORKL   Real work array of length LWORKL.  (OUTPUT/WORKSPACE)
c          WORKL(1:ncv*ncv+2*ncv) contains information obtained in
c          cnaupd.  They are not changed by cneupd.
c          WORKL(ncv*ncv+2*ncv+1:3*ncv*ncv+4*ncv) holds the
c          untransformed Ritz values, the untransformed error estimates of 
c          the Ritz values, the upper triangular matrix for H, and the
c          associated matrix representation of the invariant subspace for H.
c
c          Note: IPNTR(9:13) contains the pointer into WORKL for addresses
c          of the above information computed by cneupd.
c          -------------------------------------------------------------
c          IPNTR(9):  pointer to the NCV RITZ values of the
c                     original system.
c          IPNTR(10): Not used
c          IPNTR(11): pointer to the NCV corresponding error estimates.
c          IPNTR(12): pointer to the NCV by NCV upper triangular
c                     Schur matrix for H.
c          IPNTR(13): pointer to the NCV by NCV matrix of eigenvectors
c                     of the upper Hessenberg matrix H. Only referenced by
c                     cneupd if RVEC = .TRUE. See Remark 2 below.
c          -------------------------------------------------------------
c
c  INFO    Integer.  (OUTPUT)
c          Error flag on output.
c          =  0: Normal exit.
c
c          =  1: The Schur form computed by LAPACK routine csheqr
c                could not be reordered by LAPACK routine ctrsen.
c                Re-enter subroutine cneupd with IPARAM(5)=NCV and
c                increase the size of the array D to have
c                dimension at least dimension NCV and allocate at least NCV
c                columns for Z. NOTE: Not necessary if Z and V share
c                the same space. Please notify the authors if this error
c                occurs.
c
c          = -1: N must be positive.
c          = -2: NEV must be positive.
c          = -3: NCV-NEV >= 2 and less than or equal to N.
c          = -5: WHICH must be one of 'LM', 'SM', 'LR', 'SR', 'LI', 'SI'
c          = -6: BMAT must be one of 'I' or 'G'.
c          = -7: Length of private work WORKL array is not sufficient.
c          = -8: Error return from LAPACK eigenvalue calculation.
c                This should never happened.
c          = -9: Error return from calculation of eigenvectors.
c                Informational error from LAPACK routine ctrevc.
c          = -10: IPARAM(7) must be 1,2,3
c          = -11: IPARAM(7) = 1 and BMAT = 'G' are incompatible.
c          = -12: HOWMNY = 'S' not yet implemented
c          = -13: HOWMNY must be one of 'A' or 'P' if RVEC = .true.
c          = -14: CNAUPD did not find any eigenvalues to sufficient
c                 accuracy.

drew-parsons commented 2 years ago

Yeah something odd going on there. The python call provides 17 arguments, but the fortran function expects 24. I expect it's something particular about the way python calls fortran functions, and isn't actually an error (in scipy), otherwise it would fail in all cases, not just for blis.

It's going through f2py, related to https://github.com/scipy/scipy/blob/main/scipy/sparse/linalg/_eigen/arpack/arpack.pyf.src

drew-parsons commented 2 years ago

This version adds the missing arguments, following the definitions in arpack.pyf.src. test_blis.f90.txt

It does build successfully but it doesn't reproduce the segfault (the test runs without errors). Not sure if that means the error arises via f2py processing so isn't seen in the direct fortran test, or if I missed some detail setting up the test.

devinamatthews commented 2 years ago

This is always the problem with Python wrappers... how feasible is it to try and get a backtrace of the segfault when called from Python?

drew-parsons commented 2 years ago

Eh, it principle it can be done. python can be run in gdb. It does come up with a backtrace, but in this case the backtrace is empty.

Program received signal SIGSEGV, Segmentation fault.
0xffffc830 in ?? ()
(gdb) bt
#0  0xffffc830 in ?? ()
#1  0x3f800000 in ?? ()
#2  0x00000000 in ?? ()

My guess is that means it's happening inside scipy's extension module for arpack (/usr/lib/python3/dist-packages/scipy/sparse/linalg/eigen/arpack/_arpack.cpython-39-i386-linux-gnu.so) . Usually debug symbols can be made available (I did install the python3-scipy-dbgsym package) but looks like the fortran code in _arpack.so got handled differently, compiling without debug symbols. Would have to rebuild scipy again from source which is a pain.

flame / blis

triggers segfault in scipy arpack test (test_hermitian_modes) on i386 #602