SciFortran / SciFortran

An open-source Fortran library for mathematics, science and engineering (*in a way* just like scipy for python)
http://SciFortran.github.io/SciFortran
GNU Lesser General Public License v3.0
172 stars 39 forks source link

Segfault when using system-provided lapack #22

Closed Nik-Wagner closed 10 months ago

Nik-Wagner commented 1 year ago

Using the lapack version provided by the system (Manjaro 22.0.2 with openblas 0.3.21-1) CDMFT calculations (for various drivers) crash with a segmentation fault during the diagonalization of the impurity Hamiltonian. Compiling SciFortran with -DWITH_BLAS_LAPACK everything works fine.

gdb gives the following backtrace:

Thread 1 "cdn_ssh" received signal SIGSEGV, Segmentation fault. 0x00007ffff6df50b2 in zgemv_n_HASWELL () from /usr/lib/libopenblas.so.3

0 0x00007ffff6df50b2 in zgemv_n_HASWELL () from /usr/lib/libopenblas.so.3

1 0x00007ffff66b550c in zgemv_ () from /usr/lib/libopenblas.so.3

2 0x00007ffff7e5f199 in zlatrd_ () from /usr/lib/liblapack.so.3

3 0x00007ffff7ddc9d0 in zhetrd_ () from /usr/lib/liblapack.so.3

4 0x00007ffff7dd36ac in zheevd_ () from /usr/lib/liblapack.so.3

5 0x0000555555b58775 in __sf_linalg_MOD_zeigh_simple ()

6 0x00005555555c5f9e in ed_diag::ed_diag_d ()

log.txt log_err.txt

aamaricci commented 1 year ago

Hey Niklas. After a quick check with Gabriele and Samuele (who's using libopenblas 3.2.0 successfully) it looks like this issue of yours is related to your system specifically.

In reality, if you're not using any particular optimized version of lapack-blas, linking against the internal copies or the external ones is identical from the performance point of view.

Can you please try to test a simple code diagonalizing a trivial matrix? It looks like the error comes from calling eigh(M) with complex M.

Let us know. A

Nik-Wagner commented 1 year ago

Calling eigh(M) works fine.

beddalumia commented 1 year ago

Chiming in for a comment: if two different versions of LAPACK (whose implementations should more or less adhere to a standard) behave differently, particularly one of the two segfaults whereas the other gives correct results, I'd say that the problems lies in the failing LAPACK implementation itself. As a matter of fact we have direct control only on the bundled one, which reportedly behaves as intended here, so I'm afraid we could be of very limited help here.

You could try switching the minor version of openblas to 0.3.20 (which @SamueleGiuli has been using for a while, without any noticeable issue) or even try switching to MKL or other third-party implementation of LAPACK, if you really need to avoid compiling the version bundled with SciFortran (which I agree, can slowdown quite a bit the installation process).