OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.26k stars 1.48k forks source link

BLAS : Program is Terminated. Because you tried to allocate too many memory regions. #1882

Closed yurivict closed 5 years ago

yurivict commented 5 years ago

I used openblas for blas/lapack functions in the erkale project, and it fails. erkale's author says that openblas is broken, see https://github.com/susilehtola/erkale/issues/29#issuecomment-441006738

martin-frbg commented 5 years ago

dgetrf zherk dsyrk are not guarded against early threading. Ignore me if I dont produce PRs today.

Could you create a separate issue for that please (I assume with "early threading" you mean inefficient multithreading for tiny problem sizes (and not something leading to catastrophic failure), but I am guaranteed to lose my mind if I try to look into that today).

brada4 commented 5 years ago

@yurivict (not related to current issue at all) is it possible to get to FreeBSD something like linux pax-utils, i.e. lddtree to find 2 distinct OMP imports and symtree to quickly list imported functions per library?

yurivict commented 5 years ago

@brada4 Is it this package: https://www.freshports.org/sysutils/pax-utils ?

yurivict commented 5 years ago

FYI You can use Repology website to search for packages by name in different systems: https://repology.org/

brada4 commented 5 years ago

Installed, thanks :-)

martin-frbg commented 5 years ago

Closing as the crucial change, adding -frecursive to the gfortran options, was released with 0.3.4 already.

zhilians commented 5 years ago

@brada4 Is there any 'harm' on increasing the number of memory buffers to large number?

From the memory.c:

local_memory_table = (struct alloc_t **)malloc(sizeof(struct alloc_t *) * NUM_BUFFERS);
memset(local_memory_table, 0, sizeof(struct alloc_t *) * NUM_BUFFERS);

It seems that it's just 64 * NUM_BUFFERS bytes in memory.

From the discussions above, I can't quite relate the local_memory_table with fitting data into CPU cache. Is there any harm to have a very large NUMBER_THREADS in build time, say 4,096 for a 96 core CPU, and call OpenBLAS in 128 concurrent threads? In such way, our program won't be terminated with an outburst of number of concurrent threads.

brada4 commented 5 years ago

Very bad to steal closed unrelated thread... Open a new one if you want to discuss what is not clear from discussion in(unrelated to this) #1858