bsc-pm / nanox

Nanos++ is a runtime designed to serve as runtime support in parallel environments. It is mainly used to support OmpSs, a extension to OpenMP developed at BSC.
https://pm.bsc.es/nanox
GNU Lesser General Public License v3.0
38 stars 15 forks source link

Feature Request: Add support for Valgrind #7

Closed devreal closed 4 years ago

devreal commented 4 years ago

Valgrind offers an API [1] that allows marking memory regions as stacks. This helps to reduce the number of false positives when using Valgrind in combination with OmpSs.

Such an instrumentation could be added to a debug build if valgrind has been detected during configure. The change should be minimal (3 lines in smpdd.cpp plus integration into the build system).

[1] http://valgrind.org/docs/manual/manual-core-adv.html

lucjaulmes commented 4 years ago

That would be really nice. Passing ompss programs through valgrind is always a very painful process.

vlopezh commented 4 years ago

While I wouldn't mind to add such support, the Valgrind documentation states this warning about all the three client requests related to stacks:

Warning: Unfortunately, this client request is unreliable and best avoided.

Which makes me think whether this effort is worthwhile.

Have you considered to disable user-level threads when running Valgrind with OmpSs? (export NX_ARGS+=' --disable-ut').

devreal commented 4 years ago

@vlopezh I didn't see these warnings. I have incorporated the Valgrind macros into my own UT runtime and hacked it into OmpSs (because the false positives made using Valgrind impossible).

Have you considered to disable user-level threads when running Valgrind with OmpSs? (export NX_ARGS+=' --disable-ut').

How does that change the behavior of the runtime? Does that mean task execution is serialized? I cannot find documentation for it right of the hand...

xteruel commented 4 years ago

Disabling User Threads (or ULT) does not mean to serialize task execution, but to avoid task migration among threads. It directly means that once a task is started in one thread, it will be tied to this thread and, after suspended (e.g., taskwait) it could not be resumed by another thread.

It corresponds, more or less, to the implementation of a tied task in OpenMP (which, by the way, is the default behaviour in this model).

xteruel commented 4 years ago

I have incorporated the Valgrind macros into my own UT runtime and hacked it into OmpSs (because the false positives made using Valgrind impossible).

Could you circulate your implementation to have a clearer idea of the required changes and the parameters used.

devreal commented 4 years ago

It's by far not complete (misses deregistration) but it did the trick for me:

--- ompss-19.06/nanox-0.15/src/arch/smp/smpdd.cpp   2019-06-27 19:00:34.000000000 +0200
+++ ompss-19.06-valgrind/nanox-0.15/src/arch/smp/smpdd.cpp  2020-02-03 13:57:23.675151305 +0100
@@ -25,6 +25,7 @@
 #include "taskexecutionexception.hpp"
 #include "smpdevice.hpp"
 #include "schedule.hpp"
+#include <valgrind/valgrind.h>
 #include <string>

 using namespace nanos;
@@ -88,6 +89,7 @@
    if (isUserLevelThread) {
       if (previous == NULL) {
          _stack = (void *) NEW char[_stackSize];
+         VALGRIND_STACK_REGISTER(_stack, (void*)(((uintptr_t)_stack)+_stackSize));
          verbose0("   new stack created: " << _stackSize << " bytes");
       } else {
          verbose0("   reusing stacks");

I didn't bother wrapping the valgrind calls in some nanox macros to disable them if needed. It could be sufficient to add another configure flag --enable-valgrind that needs to be explicitly added.

vlopezh commented 4 years ago

Added feature in e4b5dbe0ad5c84a1849353756115b84a1e86d50b. Thank you.