PRUNERS / archer

Archer, a data race detection tool for large OpenMP applications
https://pruners.github.io/archer
Apache License 2.0
61 stars 13 forks source link

Archer 2.0.0 unexpectedly dies with clang 6.0 #89

Closed SteffenSeckler closed 5 years ago

SteffenSeckler commented 5 years ago

Sometimes I get the following kind of error:

15:     #0 pthread_create ??:? (sph-traversals+0x45154b)

15:     #1 __kmp_create_worker /ArcherBuild/openmp/build/../runtime/src/z_Linux_util.cpp:853 (libomp.so+0xc40c9)
// some more stack trace
One of the following ignores was not ended (in order of probability)

15:   Ignore was enabled at:

It just suddenly appears and is followed be nothing. Only a stack trace is shown. I am running the executable with TSAN_OPTIONS=ignore_noninstrumented_modules=1 as discussed in #87. So maybe the stack traces of the underlying problems are not shown because of the ignore statement. The error seems to stem from: https://github.com/llvm-mirror/compiler-rt/blob/release_60/lib/tsan/rtl/tsan_rtl_thread.cc l.187

full log can be found here: http://vmbungartz10.informatik.tu-muenchen.de/mardyn/blue/organizations/jenkins/AutoPas-Multibranch/detail/gpu-extensions/20/pipeline

jprotze commented 5 years ago

Thanks for reporting this! My rough idea of what is going on:

Can you reproduce the reported behavior with ARCHER_OPTIONS="print_ompt_counters=1"? This would allow us to confirm my assumption and isolate the bug in the runtime.

We have a better solution for handling the reduction in: https://github.com/PRUNERS/openmp , branch archer_80_reduction

This is on the way to be integrated into upstream LLVM/openmp (should be there in LLVM/9). By using this OpenMP runtime, Archer gets automatically active, if the application is compiled with "-fsanitize=thread". So instead of using clang-archer++ you would compile with clang++ -fopenmp -fsanitize=thread

TSAN_OPTIONS=ignore_noninstrumented_modules=1 is still needed with that solution.

[1] https://github.com/PRUNERS/archer/blob/master/rtl/ompt-tsan.cpp#L603

SteffenSeckler commented 5 years ago

Hi, I actually haven't seen this anymore after enabling the print_ompt_counters option. So it seems to have solved itself... I will report back if it reappears and close this for now.