OpenCilk / opencilk-project

Monorepo for the OpenCilk compiler. Forked from llvm/llvm-project and based on Tapir/LLVM.
Other
93 stars 29 forks source link

Issue when using multiple sanitizers #133

Closed wheatman closed 9 months ago

wheatman commented 2 years ago

Describe the bug

I have some code which I am compiling with multiple sanitizers (cilk, undefined, and address) and if I use any one individually, or any pair of them no issues are reported, but if I use all three than races are detected. While I think it is possible that these are true races since I don't know how the sanitizers work that well, if that is the case I feel like some warning should be given that states that is a possible, or known issue.

Another thing is that I rewrote my code in what I believe to be a equivalent way, just without some encapsulation and no issues are reported with the sanitizers.

Expected behavior

Either no races, a warning when compiling with sanitizers that won't work together, or a warning in the race detection that points me to the cause of the issue being inside of another sanitizer.

OpenCilk version

clang version 14.0.6 (https://github.com/OpenCilk/opencilk-project fc90ded2b090672f84c58d12d8d85cd999eb6c1a)

Steps to reproduce (include relevant output)

The following code causes the issue when compiled with clang++ -Og -fopencilk -fsanitize=cilk,address,undefined -o basic test.cpp


#include <cilk/cilk.h>
#include <cilk/cilk_api.h>
#include <cilk/cilksan.h>
#include <vector>

class Reducer_sum {
  std::vector<int> data;
  Cilksan_fake_mutex fake_lock;

public:
  Reducer_sum() { data.resize(__cilkrts_get_nworkers()); }

  Reducer_sum &operator+=(int new_value) {
    Cilksan_fake_lock_guard guad(&fake_lock);
    data[__cilkrts_get_worker_number()] += new_value;
    return *this;
  }

  operator int() const {
    int output = 0;
    for (const auto &d : data) {
      output += d;
    }
    return output;
  }
};

int f(int num) {
  Reducer_sum red;
  cilk_for(int i = 0; i < num; i++) { red += i; }
  return red;
}

int main() { return f(10); }

The following code does basically the same thing, but does not cause the issue

int g(int num) {
  std::vector<int> red;
  red.resize(__cilkrts_get_nworkers());
  Cilksan_fake_mutex fake_lock;
  cilk_for(int i = 0; i < num; i++) {
    Cilksan_fake_lock_guard guad(&fake_lock);
    red[__cilkrts_get_worker_number()] += i;
  }
  int total = 0;
  for (auto e : red) {
    total += e;
  }
  return total;
}

The output is

Running Cilksan race detector.
Race detected on location fe1aaa17d74
*    Write 56298b _ZN11Reducer_sumpLEi test.cpp
|        `-to variable <no variable name> (declared at <no filename>)
+     Call 561899 _Z1fi test.cpp
|*   Write 56298b _ZN11Reducer_sumpLEi test.cpp
||       `-to variable <no variable name> (declared at <no filename>)
|+    Call 561899 _Z1fi test.cpp
\| Common calling context
 +  Parfor 56084d _Z1fi test.cpp
 +    Call 5614d2 main test.cpp

Race detected on location fe1aaa17d78
*     Read 562b7e _ZN11Reducer_sumpLEi test.cpp
|        `-to variable <no variable name> (declared at <no filename>)
+     Call 561899 _Z1fi test.cpp
|*   Write 56298b _ZN11Reducer_sumpLEi test.cpp
||       `-to variable <no variable name> (declared at <no filename>)
|+    Call 561899 _Z1fi test.cpp
\| Common calling context
 +  Parfor 56084d _Z1fi test.cpp
 +    Call 5614d2 main test.cpp

Race detected on location fe1aaa17d78
*    Write 562bc6 _ZN11Reducer_sumpLEi test.cpp
|        `-to variable <no variable name> (declared at <no filename>)
+     Call 561899 _Z1fi test.cpp
|*   Write 5629bc _ZN11Reducer_sumpLEi test.cpp
||       `-to variable <no variable name> (declared at <no filename>)
|+    Call 561899 _Z1fi test.cpp
\| Common calling context
 +  Parfor 56084d _Z1fi test.cpp
 +    Call 5614d2 main test.cpp

Race detected on location fe1aaa17d78
*     Read 562b7e _ZN11Reducer_sumpLEi test.cpp
|        `-to variable <no variable name> (declared at <no filename>)
+     Call 561899 _Z1fi test.cpp
|*   Write 5629bc _ZN11Reducer_sumpLEi test.cpp
||       `-to variable <no variable name> (declared at <no filename>)
|+    Call 561899 _Z1fi test.cpp
\| Common calling context
 +  Parfor 56084d _Z1fi test.cpp
 +    Call 5614d2 main test.cpp

Race detected on location fe1aaa17d78
*    Write 562bc6 _ZN11Reducer_sumpLEi test.cpp
|        `-to variable <no variable name> (declared at <no filename>)
+     Call 561899 _Z1fi test.cpp
|*    Read 5629f1 _ZN11Reducer_sumpLEi test.cpp
||       `-to variable <no variable name> (declared at <no filename>)
|+    Call 561899 _Z1fi test.cpp
\| Common calling context
 +  Parfor 56084d _Z1fi test.cpp
 +    Call 5614d2 main test.cpp

Race detected on location fe1aaa17d78
*    Write 562bc6 _ZN11Reducer_sumpLEi test.cpp
|        `-to variable <no variable name> (declared at <no filename>)
+     Call 561899 _Z1fi test.cpp
|*   Write 562bc6 _ZN11Reducer_sumpLEi test.cpp
||       `-to variable <no variable name> (declared at <no filename>)
|+    Call 561899 _Z1fi test.cpp
\| Common calling context
 +  Parfor 56084d _Z1fi test.cpp
 +    Call 5614d2 main test.cpp

Race detected on location fe1aaa17d78
*     Read 562b7e _ZN11Reducer_sumpLEi test.cpp
|        `-to variable <no variable name> (declared at <no filename>)
+     Call 561899 _Z1fi test.cpp
|*   Write 562bc6 _ZN11Reducer_sumpLEi test.cpp
||       `-to variable <no variable name> (declared at <no filename>)
|+    Call 561899 _Z1fi test.cpp
\| Common calling context
 +  Parfor 56084d _Z1fi test.cpp
 +    Call 5614d2 main test.cpp

Cilksan detected 7 distinct races.
Cilksan suppressed 35 duplicate race reports.
neboat commented 2 years ago

I looked into this issue. Long story short, the reported races are not on memory accesses in the program itself, but on memory accesses from ASan's instrumentation. In this sense, the sanitizers are working, but Cilksan ends up checking ASan (to some extent) in addition to the program itself.

Here are some more details of what I've determined so far.

I'll need to think more about how to get Cilksan to clearly attribute these races to ASan's instrumentation, rather than the user program. Because the relevant memory accesses come from compiler-inserted instrumentation, there are no debug symbols for those memory accesses, nor is there any other attribution information attached to those accesses.

wheatman commented 1 year ago

Was this fixed in the most recent version, it seems to to me. If so this issue can be closed. I'm not doing it myself in case the larger issue you see wasn't fixed and its just a change in optimizations and layout which no longer make the errors show up.

neboat commented 9 months ago

I agree, this issue seems to be fixed in the latest release.