OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.15k stars 1.46k forks source link

Strange issues importing Numpy/OpenBLAS related to ulimit #4762

Open murfalo opened 2 weeks ago

murfalo commented 2 weeks ago

I am attempting to use a simple python script:

#!/usr/bin/env python3

import numpy

This fails due to:

OpenBLAS blas_thread_init: pthread_create failed for thread 21 of 128: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1029364 current, 1029364 max
OpenBLAS blas_thread_init: pthread_create failed for thread 22 of 128: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1029364 current, 1029364 max
OpenBLAS blas_thread_init: pthread_create failed for thread 23 of 128: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1029364 current, 1029364 max
OpenBLAS blas_thread_init: pthread_create failed for thread 24 of 128: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1029364 current, 1029364 max
OpenBLAS blas_thread_init: pthread_create failed for thread 25 of 128: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1029364 current, 1029364 max
...

(full log attached, Python 3.9.18, numpy 1.26.4, libopenblas 0.3.24)

Here are my initial ulimits:

-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             300000
-c: core file size (blocks)         unlimited
-m: resident set size (kbytes)      8388608
-u: processes                       1029364
-n: file descriptors                16384
-l: locked-in-memory size (kbytes)  unlimited
-v: address space (kbytes)          8388608
-x: file locks                      unlimited
-i: pending signals                 1029364
-q: bytes in POSIX msg queues       819200
-e: max nice                        0
-r: max rt priority                 0
-N 15:                              unlimited

The error appears to be due to thread allocation, however:

  1. OpenBLAS fails after allocating only 20 threads on a fresh user login (with ~8 threads running initially). This should come nowhere near 1029364 total.
  2. Setting ulimit -v unlimited, or at least around 67108684, fixes this issue.

This appears to be related to an issue reported in 2022 for numpy. It appears to be dead and the final comment suggested bringing it up here. Any ideas what might be happening?

martin-frbg commented 2 weeks ago

Yes, looks like you are running out of address space for the memory buffer that is used to communicate partial results between threads. The output of RLIMIT_NPROC was added only because this seemed to be the limit one is most likely to hit, I don't recall address space being a problem before.

brada4 commented 2 weeks ago

300MB stack is excessive...

martin-frbg commented 2 weeks ago

Unusual, but may have been set during testing. I'm more intrigued by the low limit on address space (or virtual memory) that is causing the problem here - I'm more used to seeing this default to "unlimited" on any reasonably modern hardware ?

murfalo commented 2 weeks ago

For context, this ulimit -a is from an HPC system head node. The limits were imposed by the system administrators to restrict fair play usage. Perhaps unsurprisingly, OpenBLAS is not the only library or program that the ulimit -v causes to crash.

I've been working with them to find a solution (some of the head nodes have a hard limit of 8-16 GB), but in the meantime I was curious why OpenBLAS was reporting an issue with ulimit -u when ulimit -v seemed to be the root of the issue. Would it be possible to modify OpenBLAS to report the correct problem, and/or suggest possible solutions (e.g., reducing OPENBLAS_NUM_THREADS)? This could be helpful any future users that run into this issue.

Thanks for your help so far!

martin-frbg commented 2 weeks ago

This is simply because issues with ulimit -u are the only ones documented on the fork(2) manpage to raise EAGAIN, and the only cause of fork-related early aborts encountered so far.