OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.39k stars 1.5k forks source link

Segfault with large NUM_THREADS #2839

Closed Enchufa2 closed 4 years ago

Enchufa2 commented 4 years ago

In Fedora, we set NUM_THREADS=128 for the openmp and threaded versions (see spec file for reference; cc @susilehtola). Recently, we switched to openblas-openmp as the system-wide default BLAS/LAPACK implementation. Then, we found out that a test in the octave-statistics package (canoncorr.m) is segfaulting (octave was previously using openblas-serial), and we have managed to narrow down the issue to this point so far. Here's a reproducible example with the current master branch:

$ docker run --rm -it fedora:rawhide
$ dnf install -y octave-statistics make git perl-devel
$ CMD='octave -H -q --no-window-system --no-site-file --eval pkg("load","statistics");test("/usr/share/octave/packages/statistics-1.4.1/canoncorr.m");'
$ git clone https://github.com/xianyi/OpenBLAS && cd OpenBLAS
$ make USE_THREAD=1 USE_OPENMP=1 NUM_THREADS=128
$ LD_PRELOAD=$PWD/libopenblas.so.0 $CMD
Segmentation fault (core dumped)

but

$ make clean
$ make USE_THREAD=1 USE_OPENMP=1 NUM_THREADS=64
$ LD_PRELOAD=$PWD/libopenblas.so.0 $CMD
PASSES 7 out of 7 tests

Any idea what could be happening here?

Diazonium commented 4 years ago

Slightly off topic, but why does Java VM/RE feel the need to mess with the stack size to begin with? Also if Java can barge in and modify the stack size without any regard to other libraries, maybe OpenBLAS could also do the same thing, and increase the stack size to its liking, instead of conforming to the unreasonably small stack limit set by Java? Although this is somewhat hostile towards Java and I have no idea if this would cause mayhem in Java though.

Enchufa2 commented 4 years ago

That is a good point. Increasing the stack to meet Linux default would be the easiest approach here to avoid performance degradation under certain configurations. But in the mid-term, I think that the best approach would be to move towards a heap-based memory pool: the best of both worlds. AFAIK, both BLIS and MKL do use memory pools.

brada4 commented 4 years ago

Back in the beginnings java was one of early mass-market framework with zillions of threads. How do you justify "incresing stack" for 200-something threads java creates in embedded or realtime context where it is quite popular too. Stack is not primary storage, it is abstraction of call chain inside primary storage, and java's default of 32k ... 512k permits quite a bit of recursion.

martin-frbg commented 4 years ago

Hmm. The only clean solution seems to be to reduce the default threshold to match the Java stacksize (= failsafe for distribution packagers), and make it easily configurable through a build parameter so that anybody certain to never call it from Java can restore the old, slightly more efficient behaviour.