Closed jishnub closed 3 months ago
We just bumped openblas yesterday, so it is quite likely an upstream bug.
On my mac I'm not getting a segmentation fault, but it seems to keep running endlessly. We should report this upstream.
Works fine for size 63x63
for me and stops working 64x64
onwards.
For the record, I am being able to call dsyev
in C using OpenBLAS's master branch when compiling with gcc
directly.
E.g., using the following code:
#include <stdio.h>
#include <stdlib.h>
#include <lapacke.h>
int main() {
int n = 64;
double *a;
a = (double*) malloc(n*n*sizeof(double));
// Check if the memory has been successfully
// allocated by malloc or not
if (a == NULL) {
printf("Memory not allocated.\n");
exit(0);
}
for(int row=0; row<n; row++){
for(int col=0; col<=row; col++){
a[row*n + col] = (double)rand()/(double)(RAND_MAX);
}
}
// Allocate space for eigenvalues
double *w;
w = (double*) malloc(n * sizeof(double));
if (w == NULL) {
printf("Memory not allocated.\n");
exit(0);
}
// Leading dimension of the matrix
int lda = n;
// Compute eigenvalues and eigenvectors
LAPACKE_dsyev(LAPACK_ROW_MAJOR, 'N', 'U', n, a, lda, w);
// Print eigenvalues
printf("Eigenvalues:\n");
for (int i = 0; i < n; i++) {
printf("%f\n", w[i]);
}
free(a);
free(w);
return 0;
}
Compiling with
gcc -o hermeig hermeig.c -Iopenblasinstallation/include -Lopenblasinstallation/lib -lopenblas
This runs correctly and produces results.
I've noticed that this error occurs in julia when I set BLAS.set_num_threads(4)
or higher. There is no segfault for 3 or fewer threads.
rr trace attached: https://julialang-dumps.s3.amazonaws.com/reports/2024-08-13T11-47-58-jishnub.tar.zst
For the record, I am being able to call dsyev in C using OpenBLAS's master branch when compiling with gcc directly.
It'd be useful to make sure you can reproduce the segfault when linking to Julia's libopenblas build: if not, that test may not be useful.
Doesn't segfault when linking to our openblas with 1 or 4 threads. Used slightly modified driver below:
➜ lib git:(master) gcc -o hermeig hermeig.c -Iopenblasinstallation/include -Lopenblasinstallation/lib -L. -lopenblas64_ -I../include -Wno-implicit-function-declaration
ld: warning: search path 'openblasinstallation/lib' not found
ld: warning: reexported library with install name '@rpath/libunwind.1.dylib' found at '/Users/viral/julia/usr/lib/libunwind.1.0.dylib' couldn't be matched with any parent library and will be linked directly
➜ lib git:(master) OPENBLAS_NUM_THREADS=4 DYLD_LIBRARY_PATH=. ./hermeig
Eigenvalues:
0.000008
0.022556
0.036368
0.050084
#include <stdio.h>
#include <stdlib.h>
#include <lapacke.h>
int main() {
long n = 64;
double *a;
a = (double*) malloc(n*n*sizeof(double));
openblas_set_num_threads64_(4);
// Check if the memory has been successfully
// allocated by malloc or not
if (a == NULL) {
printf("Memory not allocated.\n");
exit(0);
}
for(long row=0; row<n; row++){
for(long col=0; col<=row; col++){
a[row*n + col] = (double)rand()/(double)(RAND_MAX);
}
}
// Allocate space for eigenvalues
double *w;
w = (double*) malloc(n * sizeof(double));
if (w == NULL) {
printf("Memory not allocated.\n");
exit(0);
}
// Leading dimension of the matrix
long lda = n;
// Compute eigenvalues and eigenvectors
LAPACKE_dsyev64_(LAPACK_ROW_MAJOR, 'N', 'U', n, a, lda, w);
// Print eigenvalues
printf("Eigenvalues:\n");
for (int i = 0; i < n; i++) {
printf("%f\n", w[i]);
}
free(a);
free(w);
return 0;
}
Then that doesn't seem to be a good reproducer 🙂
Or there could perhaps be a bug on the Julia side?
The LAPACKE interface seems to hide the lwork
discovery.
Or there could perhaps be a bug on the Julia side?
Perhaps, but I'd spend more time to make sure the C reproducer is 100% faithful to what we're doing on the Julia side. Also, if you want to build OpenBLAS locally, to reproduce the Yggdrasil build I'd suggest compiling OpenBLAS with something like
make USE_THREAD=1 GEMM_MULTITHREADING_THRESHOLD=400 NO_AFFINITY=1 NO_STATIC=1 NUM_THREADS=512 INTERFACE64=1 SYMBOLSUFFIX=64_ DYNAMIC_ARCH=1 TARGET=GENERIC all
Also, do we have a backtrace with gdb?
@jishnub The default algorithm seems to call syevd
and not syev
, based on the docs. But all the 3 algorithms are failing. :-(
The segfault happens for all the algorithms
svd
also fails. The failures are only on Hermitian matrices 64x64 and larger. Other factorizations seem to be working fine on Hermitians.
Of course svd failing may not be additional data, in that it is probably using the same internal LAPACK routines.
@giordano I could try get a gdb trace, but on M1 mac, it just goes into an infinite loop (should I ctrl-C?) - not a segfault. @jishnub Which platform are you seeing the segfault on?
Also, do we have a backtrace with gdb?
julia> eigen(H);
Thread 9 "julia" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd23f9640 (LWP 3380628)]
0x00007fffde86df6c in dgemm_itcopy_ZEN () from /home/mose/.julia/juliaup/julia-nightly/bin/../lib/julia/libopenblas64_.so
(gdb) where
#0 0x00007fffde86df6c in dgemm_itcopy_ZEN () from /home/mose/.julia/juliaup/julia-nightly/bin/../lib/julia/libopenblas64_.so
JuliaLang/julia#1 0x00007fffdda5f70f in dsyr2k_UN () from /home/mose/.julia/juliaup/julia-nightly/bin/../lib/julia/libopenblas64_.so
JuliaLang/julia#2 0x00007fffddb99cd8 in exec_threads () from /home/mose/.julia/juliaup/julia-nightly/bin/../lib/julia/libopenblas64_.so
JuliaLang/julia#3 0x00007fffddb99efc in blas_thread_server () from /home/mose/.julia/juliaup/julia-nightly/bin/../lib/julia/libopenblas64_.so
JuliaLang/julia#4 0x00007ffff7e08ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
JuliaLang/julia#5 0x00007ffff7e9a850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
So, it looks to me the segmentation fault is in OpenBLAS. Certainly the issue could be in what Julia passes to OpenBLAS, but this again begs for a more faithful C reproducer.
x86_64-linux-gnu
So, if I set BLAS num threads to 1, then no crash on the Hermitian 64x64.
julia> H =Hermitian(rand(64,64))
julia> BLAS.set_num_threads(4)
julia> eigen(H)
^C^C^C^C^CWARNING: Force throwing a SIGINT
^C^C^C^C^CERROR: ^C^C^C^C^C^CInterruptException:
julia> ^C
julia> BLAS.set_num_threads(1)
julia> eigen(H)
Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
64-element Vector{Float64}:
-4.105758427499317
-3.971213922931517
I believe a more faithful reproducer using the direct fortran API calling the workspace query, which does not crash:
➜ lib git:(vs/55471) ✗ gcc -g -o hermeig hermeig.c -L. -lopenblas64_ -I../include -Wno-implicit-function-declaration && DYLD_LIBRARY_PATH=. ./hermeig
ld: warning: reexported library with install name '@rpath/libunwind.1.dylib' found at '/Users/viral/julia/usr/lib/libunwind.1.0.dylib' couldn't be matched with any parent library and will be linked directly
First 5 Eigenvalues:
-4.396484
-4.053237
-3.783926
-3.699819
-3.197730
#include <stdio.h>
#include <stdlib.h>
int main() {
long n = 64;
double *a;
a = (double*) malloc(n*n*sizeof(double));
openblas_set_num_threads64_(4);
// Check if the memory has been successfully
// allocated by malloc or not
if (a == NULL) {
printf("Memory not allocated.\n");
exit(0);
}
for(long row=0; row<n; row++){
for(long col=0; col<=row; col++){
a[row*n + col] = (double)rand()/(double)(RAND_MAX);
}
}
// Allocate space for eigenvalues
double *w;
w = (double*) malloc(n * sizeof(double));
if (w == NULL) {
printf("Memory not allocated.\n");
exit(0);
}
// Leading dimension of the matrix
long lda = n;
// Allocate work
long lwork = -1;
double *work = (double*) malloc(sizeof(double));
long info = -100;
char nchar = 'N';
char uchar = 'U';
// Compute eigenvalues and eigenvectors
dsyev_64_(&nchar, &uchar, &n, a, &lda, w, work, &lwork, &info);
lwork = (long)work[0];
work = (double*)realloc(work, lwork*sizeof(double));
dsyev_64_(&nchar, &uchar, &n, a, &lda, w, work, &lwork, &info);
// Print eigenvalues
printf("First 5 Eigenvalues:\n");
for (int i = 0; i < 5; i++) {
printf("%f\n", w[i]);
}
free(a);
free(w);
return 0;
}
@jishnub can you try with https://github.com/giordano/OpenBLAS_jll.jl/releases/download/OpenBLAS-v0.3.28%2B1/OpenBLAS.v0.3.28.x86_64-linux-gnu-libgfortran5.tar.gz?
I get no crashes:
julia> using LinearAlgebra
julia> BLAS.lbt_forward("./libopenblas64_.so"; clear=true)
5037
julia> H = Hermitian(ones(1024, 1024));
julia> eigen(H);
julia>
This was compiled with GCC 12. As I mentioned in https://github.com/OpenMathLib/OpenBLAS/issues/4868#issuecomment-2286464170, the segmentation fault appears to go away depending on the version of GCC used (but I can't tell whether it's a bug in OpenBLAS which is hidden/surfaced by some compiler versions, or a genuine compiler error)
Oddly, I seem to still face the issue using this:
julia> using LinearAlgebra
julia> BLAS.lbt_forward("./libopenblas64_.so"; clear=true)
5037
julia> H = Hermitian(ones(1024, 1024));
julia> eigen(H);
julia> BLAS.get_num_threads()
1
julia> BLAS.set_num_threads(4)
julia> eigen(H);
[88907] signal 11 (1): Segmentation fault
[88907] signal 11 (1): Segmentation fault
in expression starting at REPL[7]:1
Allocations: 2069146 (Pool: 2069051;Allocations: 2069146 (Pool: 2069051; Big: 95); GC: 4
Allocations: 2069146 (Pool: 2069051; Big: 95); GC: 4
[1] 88907 segmentation fault (core dumped) julia +nightly
Aaah, right, when forwarding the blas library the number of threads is reset.
which does not crash:
I think we need two cases:
Any other combination of reproducers which doesn't involve the two above is probably not very useful to narrow down the issue, because it doesn't necessarily show much.
@jishnub can you please test again https://github.com/giordano/OpenBLAS_jll.jl/releases/download/OpenBLAS-v0.3.28%2B1/OpenBLAS.v0.3.28.x86_64-linux-gnu-libgfortran5.tar.gz (same URL as before, but it's a new build, which includes https://github.com/OpenMathLib/OpenBLAS/pull/4871)? It seems to work for me now:
julia> using LinearAlgebra
julia> BLAS.lbt_forward("./libopenblas64_.so"; clear=true)
5037
julia> BLAS.set_num_threads(4)
julia> H = Hermitian(ones(1024, 1024));
julia> eigen(H);
julia>
This works for me as well, thanks!
A more minimal Julia reproducer is
using LinearAlgebra
LAPACK.syevd!('V', 'U', ones(1024, 1024));
We're entering this function https://github.com/JuliaLang/julia/blob/e1aefebe1e3c62339be4b46043625170ec538137/stdlib/LinearAlgebra/src/lapack.jl#L5432 The lapack function being called is DSYEVD
. Can someone please confirm this is a sensible program (NOTE: I updated the code compared the a previous version of this message):
#include <stdio.h>
#include <stdlib.h>
int main() {
long n = 1024;
double *a;
a = (double*) malloc(n*n*sizeof(double));
openblas_set_num_threads64_(64);
// Check if the memory has been successfully
// allocated by malloc or not
if (a == NULL) {
printf("Memory not allocated.\n");
exit(0);
}
for(long row=0; row<n; row++){
for(long col=0; col<=row; col++){
a[row*n + col] = 1.0;
}
}
// Allocate space for eigenvalues
double *w;
w = (double*) malloc(n * sizeof(double));
if (w == NULL) {
printf("Memory not allocated.\n");
exit(0);
}
// Leading dimension of the matrix
long lda = n;
// Allocate work
double *work = (double*) malloc(sizeof(double));
long lwork = -1;
long *iwork = (long*) malloc(sizeof(long));
long liwork = -1;
long info = -100;
char jobz = 'V';
char uplo = 'U';
dsyevd_64_(&jobz, &uplo, &n, a, &lda, w, work, &lwork, iwork, &liwork, &info, 1, 1);
// Print eigenvalues
printf("First 5 Eigenvalues:\n");
for (int i = 0; i < 5; i++) {
printf("%f\n", w[i]);
}
free(a);
free(w);
return 0;
}
But this doesn't crash for me
$ gcc -o test test.c -L${HOME}/.julia/juliaup/julia-nightly/lib/julia -lopenblas64_ -Wl,-rpath,${HOME}/.julia/juliaup/julia-nightly/lib/julia
test.c: In function ‘main’:
test.c:9:5: warning: implicit declaration of function ‘openblas_set_num_threads64_’ [-Wimplicit-function-declaration]
openblas_set_num_threads64_(64);
^~~~~~~~~~~~~~~~~~~~~~~~~~~
test.c:46:5: warning: implicit declaration of function ‘dsyevd_64_’ [-Wimplicit-function-declaration]
dsyevd_64_(&jobz, &uplo, &n, a, &lda, w, work, &lwork, iwork, &liwork, &info, 1, 1);
^~~~~~~~~~
$ ./test
First 5 Eigenvalues:
0.000000
0.000000
0.000000
0.000000
0.000000
You have to call dsyevd
twice. First time for the workspace query. Then take that value and allocate the work array and then actually call it again. It should segfault in the second call where the computation will actually happen.
#include <stdio.h>
#include <stdlib.h>
int main() {
long n = 1024;
double *a;
a = (double*) malloc(n*n*sizeof(double));
openblas_set_num_threads64_(64);
// Check if the memory has been successfully
// allocated by malloc or not
if (a == NULL) {
printf("Memory not allocated.\n");
exit(0);
}
for(long row=0; row<n; row++){
for(long col=0; col<=row; col++){
a[row*n + col] = 1.0;
}
}
// Allocate space for eigenvalues
double *w;
w = (double*) malloc(n * sizeof(double));
if (w == NULL) {
printf("Memory not allocated.\n");
exit(0);
}
// Leading dimension of the matrix
long lda = n;
// workspace query
double *work = (double*) malloc(sizeof(double));
long lwork = -1;
long *iwork = (long*) malloc(sizeof(long));
long liwork = -1;
long info = -100;
char jobz = 'V';
char uplo = 'U';
dsyevd_64_(&jobz, &uplo, &n, a, &lda, w, work, &lwork, iwork, &liwork, &info, 1, 1);
// Workspace allocation
lwork = work[0];
work = (double *) malloc(lwork * sizeof(double));
liwork = iwork[0];
iwork = (long *) malloc(liwork * sizeof(long));
dsyevd_64_(&jobz, &uplo, &n, a, &lda, w, work, &lwork, iwork, &liwork, &info, 1, 1);
// Print eigenvalues
printf("First 5 Eigenvalues:\n");
for (int i = 0; i < 5; i++) {
printf("%f\n", w[i]);
}
return 0;
}
Thanks. Sadly, adding
lwork = (long)work[0];
work = (double*)realloc(work, lwork*sizeof(double));
liwork = (long)iwork[0];
iwork = (long*)realloc(iwork, liwork*sizeof(long));
dsyevd_64_(&jobz, &uplo, &n, a, &lda, w, work, &lwork, iwork, &liwork, &info, 1L, 1L);
after the first dsyevd_64_
call doesn't seem to change the outcome 😞 However it looks like https://github.com/OpenMathLib/OpenBLAS/pull/4879 works for me on the machines where I was still observing a segmentation fault after the first patch. Hopefully we'll be good after that, but I'm still puzzled by the fact we can't prepare a C reproducer. I also tried to use dlopen
/dlsym
, to even more closely match what we do in Julia, to no avail.
The bug is real, but it seems that Julia is just better at triggering it than C - whether it is our allocator or LBT or something else.
The C script does crash for me with the second call to dsyevd on mac.
The C script does crash for me with the second call to dsyevd on mac.
Ah, interesting, I can confirm it segfaults for me as well on an M1 with OpenBLAS from Julia nightly, but not with OpenBLAS from Julia v1.10.4:
% clang -Wno-implicit-function-declaration -o test test.c -L${HOME}/repo/julia/usr/lib -Wl,-rpath,${HOME}/repo/julia/usr/lib -lopenblas64_
% ./test
zsh: segmentation fault ./test
% clang -Wno-implicit-function-declaration -o test test.c -L${HOME}/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/lib/julia -Wl,-rpath,${HOME}/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/lib/julia -lopenblas64_
% ./test
First 5 Eigenvalues:
-0.000000
-0.000000
-0.000000
-0.000000
-0.000000
However it never segfaults for me on an x86_64-linux-gnu machine where I could reproduce the segfault in Julia
Similarly,
Version info:
I suspect this is an OpenBLAS issue, as there is no error when using
MKL
.Some testing shows that this error happens from
64x64
matrices, and there is no error for smaller matrices.Probably https://github.com/JuliaLang/julia/commit/9d222b87d77d8a76f806fb296e33916dde8c9411 is what caused this, as there is no error on the commit before this.