IntelligentSoftwareSystems / Galois

Galois: C++ library for multi-core and multi-node parallelization
http://iss.ices.utexas.edu/?p=projects/galois
Other
314 stars 133 forks source link

Bind to Specific Node Memory #362

Open JulianToya opened 4 years ago

JulianToya commented 4 years ago

Hey there,

I was wondering if there was a straightforward way to limit the lonestar benchmarks to only allocate memory on specific nodes on my system. Attempts with numactl -membind=0 ... causes a bus error. Any and all help is much appreciated!

l-hoang commented 4 years ago

Hello.

To my knowledge, there is no straightforward way to do this: one would have to go into the memory allocation part of the code and manually change the NUMA allocation calls.

@amberhassaan @ddn0 @insertinterestingnamehere Is there some feature I may not be aware of?

ddn0 commented 4 years ago

I briefly checked to see if I could reproduce this issue on my machine, but I couldn't. If you can send a stack trace or even a command line, that would allow for a more specific diagnosis.

None of the NUMA allocations are explicit, so if you disable thread binding GALOIS_DO_NOT_BIND_THREADS=1, that would allow numactl to dictate where memory is allocated AFAIK.

JulianToya commented 4 years ago

I realized that I'm running version 5 and this is not an issue for the smaller input graphs provided (when I turn off thread binding), just my large kron graph (73GB). Without doing any explicit memory node assignment, the lonestar benchmarks run perfectly fine with my kron graph.

This is what I'm running: sudo numactl --physcpubind=0-23 --membind=0 ./bfs -t=24 /data/graphs/wdc12/kron.gr

This is the trace:

Thread 9 "bfs" received signal SIGBUS, Bus error.
[Switching to Thread 0x7fffd3fff700 (LWP 6988)]
0x00005555555d79d3 in <lambda()>::operator() (__closure=0x7fffffffdeb0) at /home/jtoya/projects/Galois/libgalois/src/NumaMem.cpp:48
48            ptr[x] = 0;
(gdb) bt
#0  0x00005555555d79d3 in <lambda()>::operator() (__closure=0x7fffffffdeb0) at /home/jtoya/projects/Galois/libgalois/src/NumaMem.cpp:48
#1  galois::substrate::internal::ExecuteTupleImpl<std::tuple<pageIn(void*, size_t, size_t, unsigned int, bool)::<lambda()> >, 0, 1>::execute (cmds=std::tuple containing = {...})
    at /home/jtoya/projects/Galois/libgalois/include/galois/substrate/ThreadPool.h:42
#2  galois::substrate::ThreadPool::ExecuteTuple::operator() (this=0x7fffffffdeb0) at /home/jtoya/projects/Galois/libgalois/include/galois/substrate/ThreadPool.h:145
#3  std::__invoke_impl<void, galois::substrate::ThreadPool::run(unsigned int, Args&& ...) [with Args = {pageIn(void*, size_t, size_t, unsigned int, bool)::<lambda()>}]::ExecuteTuple&> (__f=...)
    at /usr/include/c++/8/bits/invoke.h:60
#4  std::__invoke<galois::substrate::ThreadPool::run(unsigned int, Args&& ...) [with Args = {pageIn(void*, size_t, size_t, unsigned int, bool)::<lambda()>}]::ExecuteTuple&> (__fn=...)
    at /usr/include/c++/8/bits/invoke.h:95
#5  std::reference_wrapper<galois::substrate::ThreadPool::run(unsigned int, Args&& ...) [with Args = {pageIn(void*, size_t, size_t, unsigned int, bool)::<lambda()>}]::ExecuteTuple>::operator()<> (
    this=<optimized out>) at /usr/include/c++/8/bits/refwrap.h:319
#6  std::_Function_handler<void(), std::reference_wrapper<galois::substrate::ThreadPool::run(unsigned int, Args&& ...) [with Args = {pageIn(void*, size_t, size_t, unsigned int, bool)::<lambda()>}]::ExecuteTuple> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/8/bits/std_function.h:297
#7  0x00005555555d1cc2 in std::function<void ()>::operator()() const (this=0x7fffffffe2c0) at /usr/include/c++/8/bits/std_function.h:682
#8  galois::substrate::ThreadPool::threadLoop (this=0x7fffffffe270, tid=<optimized out>) at /home/jtoya/projects/Galois/libgalois/src/ThreadPool.cpp:135
#9  0x00007ffff7e80d84 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007ffff7fa1609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#11 0x00007ffff7b70293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Appreciate the help!

insertinterestingnamehere commented 4 years ago

I agree that this ought to work. Unfortunately I don't have a lot to add beyond that. At least in theory we ought to respect the affinities that are set before our runtime is initialized, but I doubt we actually handle that case correctly in practice.

ddn0 commented 4 years ago

Could you send the output of numactl --hardware and cat /proc/meminfo (or similar)?