Reduce analysis cost after detected race

jprotze commented 7 years ago

Identifying duplicates in data race reports is quite expensive regarding time. It should be possible to reduce this cost by applying only logging, but no analysis for memory accesses, that were identified as racy code.

The idea would be:

implement a variant of the Tsan-rtl entrypoints as tsan_read(), that only logs, but skips race analysis and call it something like tsan_read_log(). Important is a symetric function signature.
when a data race is detected, binary patch the originating tsan_read call to call tsan_read_log instead

This will reduce analysis cost to detect duplicates and reduce output about races between other threads.

A potential issue of this approach is, that TSan potentially will not report races that this memory access has with other code regions. That means, more iterations of fixing code and running TSan might be needed.

dongahn commented 7 years ago

Good thoughts! Yes this would be a faster alternative to the current race reduction angle. Two things concern me a bit, none of which is fundamental to the proposal but important regardless for archer at this stage.

the concept of duplicates really requires additiinal research and I suspect we need to create a user feedback loopor more use cases for this in the end. This is why I wanted to make it multilvel capapbility and this can be more easily done after seeing all races.
I think having no or very minor modifications to the Tsan runtime helps at this point to be able to support Archer on various envirronments. It seems the proposal demands a large patch to Tsan itself? unless i an mostaken.

Maybe we can learn about the well-knwon reduction criteria using the current work and then use that lessons to do high speed reduction like you proposed here (once that problem is understood)

Dong

SumedhArani commented 7 years ago

@jprotze , trying to understand your proposal here.

Is it that you're suggesting that we analyse the log rather than tsan analysing memory accesses for portions of code where it's earmarked as racy region?

I'd once worked on a project(LLVM pass) which generated a trace report of a multi threaded code. One of the issues that I faced at that time were that, because of race conditions, the trace report was distorted only to be handled afterwards. But isn't doing all this and then analysing this a significant cost?

I understand that it'll be faster but how significant will the gains be? (I'm not familiar with runtime overheads associated with tsan. Just asking for my knowledge and understanding.)

Hope I've understood the proposal correctly and I'm asking the right questions.

Thank you, Sumedh.

dongahn commented 7 years ago

@SumedhArani: I think the main distinction is on-the-fly duplicate detection vs. post mortem detection. As I mentioned above, I think this optimization can be done later once we understand what constitutes "duplicates" using your technique.

SumedhArani commented 7 years ago

Ohh! When you mentioned current work, I thought you ( @dongahn ) were referring to the work being done in the research field regarding reduction.

dongahn commented 7 years ago

No, that's your topic now :-)

simoatze commented 7 years ago

@dongahn @jprotze I think I solved the problem of overriding the weak symbol of __tsan_on_report and actually do the reduction without modifying the compiler-rt. I created a new branch called race_aggregation and I am making @SumedhArani as external collaborator so he can work directly on archer repo. I'll try to give an explanation here:

@SumedhArani created a new static library redefining __tsan_on_report but when we tried to compile a program and link it against this new library tsan kept calling the weak __tsan_on_report. So googling I found this article that says that we need to link in the following way:

-Wl,--whole-archive libarcher_static_report.a -Wl,--no-whole-archive

In a nutshell, --whole-archive forces the linker to scan through all its symbols (including those already resolved). If there is a strong symbol that was already resolved by a weak symbol (as in our case), the strong symbol will overrule the weak one.

I already modified the clang-archer wrapper to add this flag automatically.

Compiling in this way tsan calls our __tsan_on_report. To suppress original Tsan report we need to run like this:

TSAN_OPTIONS="log_path=" ./program

So, what @SumedhArani did was correct, we just needed the right flag for compiling, now that we found the solution he can actually integrate the race reduction on archer! :)

Also looking at the compiler-rt I saw that in the following module:

https://github.com/llvm-mirror/compiler-rt/blob/e257e18436359d8a0636752e25e6db0f898b468a/lib/tsan/rtl/tsan_debugging.cc

there are utility functions such as:

__tsan_get_report_data
__tsan_get_report_thread
etc.

Maybe we can use those to extract the information we need for the reduction from the pointer void *rep that we get from __tsan_on_report, and we don't need to include the header files of tsan in our project. @SumedhArani you should take a look and see if we can actually exploit those functions. There is an example in one of the tsan tests here.

@SumedhArani as I said previously I created a new branch called race_aggregation because I wanted to use the newest archer version based on the last OpenMP runtime with OMPT support. Please do the following:

Delete your ARCHER fork
Start working on https://github.com/PRUNERS/archer/tree/race_aggregation
Update your OpenMP runtime to https://github.com/OpenMPToolsInterface/LLVM-openmp - Branch: towards_tr4 (If I am not mistaken, you are currently using the branch 'align-to-tr').

Let me know if all this makes sense, we can talk on Slack if further clarification is needed. I'll make you collaborator on archer in a sec.

dongahn commented 7 years ago

@simoatze: out of curiosity, was __tsan_on_report already a weak symbol? Or does this need to bring in our own locally modified TSan runtime library?

It seems this is are various hooks into "on events"

If this is these hooks, it seems we can indeed have good mileage!

I think the reduction features can be incorporated as an Archer option.

ARCHER_OPTIONS="group_by=pc"

This can be read by Archer init function which can also set and modify relevant TSAN options such as log_path. It seems it is okay to generate both TSan report and Archer summary report together though. Maybe, we first report

==== ARCHER SUMMARY Blah

==== Original TSAN Report

In any case group_by can later be extended to cover other aggregation criteria.

simoatze commented 7 years ago

@dongahn: __tsan_on_report is already weak on tsan, but seems that weak symbols don't work in the same way with static library, that's why we need this workaround.

dongahn commented 7 years ago

Your approach seems great to me! Thank you for looking into this. I know you are super busy.

jprotze commented 7 years ago

Afaik, weak symbol just avoids that you get linker error for duplicate implementation when linking static libraries. The linker uses the first implementation that is detected. So, to overwrite the symbol you would need to link your library before the tsan library. The tsan runtime is linked with the -fsanitize=thread flag.

jprotze commented 7 years ago

But linking libarcher before tsan would break our fallback implementation of tsan annotations in libarcher.

simoatze commented 7 years ago

That's what I know too and I tried to change the order, it didn't work.

jprotze commented 7 years ago

I'm not sure, but I think, the posts of last 24h were on #11 ?

@SumedhArani My intention in this issue was to binary patch callq __tsan_read4 to "nop"s at runtime, after TSan reported a data race in this specific call to __tsan_read4.

  49ce5f:       48 89 c7                mov    %rax,%rdi
  49ce62:       e8 d9 44 fd ff          callq  471340 <__tsan_write4>
  49ce67:       48 8d 45 b0             lea    -0x50(%rbp),%rax

would become

  49ce5f:       48 89 c7                mov    %rax,%rdi
  49ce62:       00 00 00 00 00          
  49ce67:       48 8d 45 b0             lea    -0x50(%rbp),%rax

SumedhArani commented 7 years ago

I know the discussions on this thread have digressed and should have rather been on #13

Your intent for this issue was to not do logging rather than that of on the fly reduction. Right?

If I'm getting you correct, you've demonstrated above that instead of sending it to tsan reporting routine, we should rather have a "nop" at runtime and rather log it. Correct @jprotze ?

Thanks!

simoatze commented 7 years ago

@jprotze I see, how would you do that? With dyninst?

dongahn commented 7 years ago

Well... I don't understand. If there is a hook for "on report", why do we need to do a binary patch? What is the current library linking order and what order is not working and for what case again?

PRUNERS / archer

Reduce analysis cost after detected race #16