Closed timjim333 closed 6 years ago
Is this what you mean? I called run
quite a few times but it did not crash. Should I keep calling run
?gdb_test3.txt
Can you try with spaces where you put the ampersands, to make sure the THREADS setting ends up in the same environment that sblat1 sees ?
OPENBLAS_NUM_THREADS=2 gdb ./sblat1
Something is different between when the test runs and under GDB but I dont understand what.
Maybe there is a memory problem somewhere in the BIGNUMA code, but at the very least I would expect it to get as far as previous runs, i.e. to the point where the code printed the "cannot open shared memory" message and should be printing a more detailed message including a reason now. So my bet is still on the OPENBLAS_NUM_THREADS setting not getting seen by the code.
That did the trick and triggered the seg fault. gdb_test4.txt
I seem to have had a successful compile - under the suggestion of the admin, I added the NO_AFFINITY
flag in the make call: make PREFIX=/home/FIa/FIa164/programs/openblas/OpenBLAS-0.2.20 FC=gfortran BIGNUMA=1 NO_AFFINITY=1
. The output can be seen here: make_output.txt
This is still with the replaced init.c
that you sent earlier. Has this produced the expected result and will installing this build produce a working copy of OpenBLAS?
Yes, you have usable OpenBLAS, and you pointed to issue that BIGNUMA=1+NO_AFFINITY=0 causes failure initialising thread affinity.
Yes, looks like this produced a usable build (though performance may be decreased by threads getting rescheduled to a different processor occasionally). I will need to look at the code path leading to the shared memory allocation again, seems the informational message I added to the latest init.c may be wrong and the code will just lose cpu affinity but not multithreading when the shmget fails. Unfortunately the gdb backtrace does not show the failing call, OpenBLAS would need to be build with DEBUG=1 to add the necessary symbols for the debugger. (Apologies for not mentioning this earlier). So far all that can be learned is that it segfaults in the routine that initiates the previously failing calls to shmget/shmat - maybe now that these failures are handled, it staggers on a bit beyond them.
I see, so in the meantime I can link to this build then. Meanwhile, I'm happy to rebuild with a debug flag if that helps get to the root of the problem.
On 17 Nov 2017 18:21, "Martin Kroeker" notifications@github.com wrote:
Yes, looks like this produced a usable build (though performance may be decreased by threads getting rescheduled to a different processor occasionally). I will need to look at the code path leading to the shared memory allocation again, seems the informational message I added to the latest init.c may be wrong and the code will just lose cpu affinity but not multithreading when the shmget fails. Unfortunately the gdb backtrace does not show the failing call, OpenBLAS would need to be build with DEBUG=1 to add the necessary symbols for the debugger. (Apologies for not mentioning this earlier). So far all that can be learned is that it segfaults in the routine that initiates the previously failing calls to shmget/shmat - maybe now that these failures are handled, it staggers on a bit beyond them.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/xianyi/OpenBLAS/issues/1348#issuecomment-345188965, or mute the thread https://github.com/notifications/unsubscribe-auth/AQD-Evr8VL3JLk0cuOU11j_d_dE9-V19ks5s3U__gaJpZM4QSyL9 .
... and once permanent code fix is in place ... just replace library file with new one... i.e dont close the issue, your help with big system testing will be handy.
Right, so I will rebuild with make PREFIX=/home/FIa/FIa164/programs/openblas/OpenBLAS-0.2.20 FC=gfortran BIGNUMA=1 DEBUG=1
and attempt torun the gdb tet again.
I built a a debug build using the following and ran the gdb
test. I've attached the results below - I hope it helps.
unset FC F90 F90FLAGS FFLAGS
make PREFIX=/home/FIa/FIa164/programs/openblas/TEST_OpenBLAS-0.2.20 FC=gfortran BIGNUMA=1 DEBUG=1
Not sure yet what to make of this - the node_info array is declared to hold MAX_NODES (128) and MAX_BITMASK_LEN members so I would not expect it to overflow in this initialization loop (unless the calculation of MAX_BITMASK_LEN went wrong much earlier - this is the number of cpus as returned by the CPU_SETSIZE macro divided by 64). Can you do
print j
print node
print MAX_BITMASK_LEN
where you currently have the t a a bt
please ?
Here is the output. There was no MAX_BITMASK_LEN
: debug_out2.txt
Hmm. Seems one would need to build with a higher debug level flag to get access to #defined constants. But checking on my system, CPU_SETSIZE appears to be 4096 (although the manpage claims 1024) on all Linux platforms at the moment, and 8*sizeof(unsigned long) should be either 32 or 64 so the loop should still do fine on element 59 of 64. Guess you could try replacing the single occurence of CPU_SETSIZE near the top of driver/others/init.c by some number that is only somewhat bigger than your actual 640 cpus, say 896 just to see if this changes anything. On the other hand I think it should be possible to trick a BIGNUMA build into running into the problematic code on my small system so maybe I can track this down myself.
Well, replacing the CPU_SETSIZE by 896 works for me here, while with CPU_SETSIZE it happens to fail at j=59 (though different nodes number) as well. My current thinking is that the struct just gets too big to fit on the stack.
Turns out there appears to be (or have been) some disagreement between glibc maintainers, in particular from SuSE about raising the value (as set in /usr/include/bits/sched.h) for CPU_SETSIZE from 1024 to 4096. (As far as I understood the discussion threads, this was to reflect increased capability of the Linux kernel when built with "maximum NUMA nodes" option, but the criticism was that it broke the API). The last discussion appears to have taken place here, which contains some hints for correct usage of sched_getaffinity() on big systems that may be relevant for OpenBLAS: https://sourceware.org/ml/libc-alpha/2016-03/msg00043.html At the very least it is conceivable that wernsaar's (experimental) code for BIGNUMA support was never tested, nor expected to work beyond the "traditional" value of 1024. In my absolutely unscientifc tests, 2048 appeared to still work (although I currently do not own a laptop capable of traversing all affected code paths :-) ) so my suggestion is to replace the current use of CPU_SETSIZE with a constant 2048 or 1024 as a quick fix.
3232 appears to be about the limit that still survives the compile tests for me, 3264 is already crashing.
@martin-frbg in init.c
, can I confirm, you mean that I should try replacing the below block:
#if defined(BIGNUMA)
// max number of nodes as defined in numa.h
// max cpus as defined in sched.h
#define MAX_NODES 128
#define MAX_CPUS CPU_SETSIZE
#else
#define MAX_NODES 16
#define MAX_CPUS 256
#endif
with:
#if defined(BIGNUMA)
// max number of nodes as defined in numa.h
// max cpus as defined in sched.h
#define MAX_NODES 128
#define MAX_CPUS 1024
#else
#define MAX_NODES 16
#define MAX_CPUS 256
#endif
then make clean
, recompile and run the gdb test, is that correct?
Yes, exactly. If it works, you should already see the build pass all tests.
@martin-frbg Is there an alternative debug level I should set, or should I go with make PREFIX=/home/FIa/FIa164/programs/openblas/TEST_OpenBLAS-0.2.20 FC=gfortran BIGNUMA=1 DEBUG=1
again?
You can add NO_LAPACK=1 NO_CBLAS=1 , it will not make complete library usable for you, but just bare minimum to get fastest to the test that fails. Sure 'make clean' between tries.
I'd just do a normal build, assuming that with such a big system build time should not be an issue.
Alright, so sticking with DEBUG=1
then.
In principle there is 'ar' invoked after blaS, then cblas, then lapack, then lapacke. In principle rebuilding last 3 is not necessary for rapid repeater.
@timjim333 , did you get around to testing this ?
Yes, sorry for the delay. It appears to build without errors: debug_out3.txt
Great, thanks for testing. I'll prepare the corresponding PR to fix this in the develop branch later.
Hi,
I'm trying to install OpenBLAS 0.2.20 on a node in a local directory over which I have permissions (I don't have root access). I seem to be encountering an error when attempting the build process. Trying
make
with or without any flags is resulting in a long string of errors that look like:make[1]: vfork: Resource temporarily unavailable
I've posted the whole output in a text file. Could anyone give any suggestions on how to troubleshoot the problem? make_error.txt
Many thanks. Tim
EDIT: In case this is useful, here is also the output of a few server parameters.
uname - or
2.6.32.54-0.3-default GNU/Linuxlsb_release -a
LSB Version: core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-x86_64:core-3.2-x86_64:core-4.0-x86_64:desktop-4.0-amd64:desktop-4.0-noarch:graphics-2.0-amd64:graphics-2.0-noarch:graphics-3.2-amd64:graphics-3.2-noarch:graphics-4.0-amd64:graphics-4.0-noarch Distributor ID: SUSE LINUX Description: SUSE Linux Enterprise Server 11 (x86_64) Release: 11 Codename: n/acat /etc/*-release
LSB_VERSION="core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-x86_64:core-3.2-x86_64:core-4.0-x86_64" SGI Accelerate 1.3, Build 705r10.sles11-1110192111 SGI Foundation Software 2.5, Build 705r10.sles11-1110192111 SGI MPI 1.3, Build 705r10.sles11-1110192111 SGI Performance Suite 1.3, Build 705r10.sles11-1110192111 SGI UPC 1.3, Build 705r10.sles11-1110192111 SUSE Linux Enterprise Server 11 (x86_64) VERSION = 11 PATCHLEVEL = 1lscpu
: Architecture: x86_64 CPU(s): 64 Thread(s) per core: 1 Core(s) per socket: 8 CPU socket(s): 8 NUMA node(s): 8 Vendor ID: GenuineIntel CPU family: 6 Model: 46 Stepping: 6 CPU MHz: 2266.424 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 24576K