PRUNERS / archer

Archer, a data race detection tool for large OpenMP applications
https://pruners.github.io/archer
Apache License 2.0
62 stars 13 forks source link

Reduction Test Case #32

Closed simoatze closed 7 years ago

simoatze commented 7 years ago

@jprotze: I was taking a look to this reduction test case:

https://github.com/PRUNERS/archer/blob/master/test/reduction/parallel-reduction.c#L13

At line 13 there is only an assignment but it's not doing any reduction, shouldn't it be a race?

Thanks!

jprotze commented 7 years ago

I'm not sure, what you mean with "not doing any reduction"? The reduction is achieved by the reduction(+: var) clause.

The code spawns a team of 5 threads. We might probably add schedule(static,1) to explicitly ask for the behavior, that the LLVM runtime shows by default: Each thread gets one iteration of the 5 iterations in the for-loop. So each thread sets the local storage for var=1. The reduction happens at the end of the for-loop: all threads add their local value of var to the var of the enclosing block. So at the end, var in main should be 5.

simoatze commented 7 years ago

@jprotze Never mind, for some reason I was thinking that the operator of the reduction must appear in the expression where the reduction variable is used like "var += 1".

Anyway, since I am here, I am trying to deal with the reduction on sword. I keep getting false positive because I have no way to identify when the reduction is done. I tried to use a flag that is set to true when the barrier begins so that I can ignore the writes until the barrier ends and set that flag to false again, but it does not work. Do you have any suggestion? Thanks!

jprotze commented 7 years ago

Looking at the example again, I just realized that there is not even a loop. But basically the description above still applies.

Do you already handle atomic operations? I think, most reductions are implemented either with atomics in the parallel region, or in a barrier. You should also consider a code like:

int main(int argc, char* argv[])
{
  int foo = 0, bar = 0, i;
  #pragma omp parallel num_threads(5) 
  {
    #pragma omp for schedule(static,1) reduction(+: foo) nowait
    for(i=0; i<5; i++)
      foo = 1;
    #pragma omp for schedule(static,1) reduction(+: bar) nowait
    for(i=0; i<5; i++)
      bar = 1;
  }
  return foo + bar != 10;
}

With the for nowait, the reduction can be implemented anywhere between the end of the loop and the next barrier, which is the closing barrier of the parallel region.

simoatze commented 7 years ago

I tried to generate the LLVM IR for the first test and even if there is nowait it calls the __kmpc_reduce_nowait, I can't see anyway to know when the reduction is happening, AFAIK OMPT does not give any information about the reduction. Even if it uses atomic, I am not instrumenting the runtime so I can't see that.

jprotze commented 7 years ago

If the reduction happens in runtime code, you are fine. In that case, you don't see the memory access at all. It might be the case, that the runtime calls reduction code in the application like it does for the parallel region.

The reduction might be implemented by atomic instructions without any call to the runtime. In that case, TSan would identify the atomic instructions and log them with atomic flag. Thats why I asked whether you evaluate the atomic bit in the log.

If you look into the implementation of __kmpc_reduce_nowait, there are various branches of implementation.

For OMPT, we discussed the need for an event signalling reduction. But up to now, we didn't find a good portable way to deal with reduction. The problem is to find the right semantics for the event, because there are so many ways of implementation. I had a discussion with Alex about this at one of the last OpenMP F4F meetings. So, I would be glad if we could come up with a good proposal.

simoatze commented 7 years ago

Looking at the LLVM IR the reduction is made by the function ".omp.reduction.reduction_func", which gets instrumented but does not have any atomic they are normal reads and writes. Even ignoring that function during instrumentation still I am getting races. I am also taking care of atomics but there is only one which is never called.

Let me know if you have time this week or next week, I don't know all the details about all the reduction methods but maybe we can do a conference call and discuss about this.

simoatze commented 7 years ago

We'll wait for new OMPT events to signal the reduction, for now I should find a work around.