Ramki-Ravindran / thread-sanitizer

Automatically exported from code.google.com/p/thread-sanitizer
0 stars 0 forks source link

Bogus data race when compiling with -g #69

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I'm puzzled by getting a data race for the following trivial code:

$ cat x.cpp
#include <thread>

void my_thread() {}

int main()
{
    std::thread my_t(my_thread);
    my_t.join();
}

I can compile and run as follows. As expected, there is no problem:

$ /usr/bin/clang++ -std=c++11 -fsanitize=thread x.cpp
$ ./a.out
$ 

Now I compile again, this time adding -g, and I get a race (see below). I'm 
seeing this with:

$ clang --version
Ubuntu clang version 3.5-1ubuntu1 (trunk) (based on LLVM 3.5)
Target: x86_64-pc-linux-gnu
Thread model: posix

That's with Ubuntu utopic:
$ uname -a
Linux djembefola 3.16.0-5-generic #10-Ubuntu SMP Mon Jul 21 16:16:35 UTC 2014 
x86_64 x86_64 x86_64 GNU/Linux

Any suggestions? Is it possible that I'm getting this because of a messed-up 
compiler installation? (I also have gcc 4.8 and 4.9 on the machine.)

Here is what I get after compiling with -g:

$ /usr/bin/clang++ -g -std=c++11 -fsanitize=thread x.cpp
$ ./a.out
==================
WARNING: ThreadSanitizer: data race (pid=24666)
  Write of size 8 at 0x7d0c0000efd8 by thread T1:
    #0 operator delete(void*) <null>:0 (a.out+0x000000049fbb)
    #1 deallocate /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/ext/new_allocator.h:110 (a.out+0x0000000a742f)
    #2 deallocate /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/alloc_traits.h:383 (a.out+0x0000000a7300)
    #3 _M_destroy /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/shared_ptr_base.h:535 (a.out+0x0000000a7de2)
    #4 std::this_thread::__sleep_for(std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) <null>:0 (libstdc++.so.6+0x0000000bae30)

  Previous atomic write of size 4 at 0x7d0c0000efd8 by main thread:
    #0 __tsan_atomic32_fetch_add <null>:0 (a.out+0x00000008b786)
    #1 __exchange_and_add /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/ext/atomicity.h:49 (a.out+0x0000000a5645)
    #2 __exchange_and_add_dispatch /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/ext/atomicity.h:82 (a.out+0x0000000a550c)
    #3 _M_release /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/shared_ptr_base.h:146 (a.out+0x0000000a5e11)
    #4 ~__shared_count /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/shared_ptr_base.h:666 (a.out+0x0000000a5db0)
    #5 ~__shared_ptr /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/shared_ptr_base.h:914 (a.out+0x0000000a5f29)
    #6 ~shared_ptr /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/shared_ptr.h:93 (a.out+0x0000000a5c83)
    #7 thread<void (&)()> /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/thread:135 (a.out+0x0000000a58e3)
    #8 main /tmp/x.cpp:7 (a.out+0x0000000a5404)

  Location is heap block of size 48 at 0x7d0c0000efd0 allocated by main thread:
    #0 operator new(unsigned long) <null>:0 (a.out+0x000000049a4d)
    #1 allocate /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/ext/new_allocator.h:104 (a.out+0x0000000a9267)
    #2 allocate /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/alloc_traits.h:357 (a.out+0x0000000a710a)
    #3 __shared_count<std::thread::_Impl<std::_Bind_simple<void (*())()> >, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, std::_Bind_simple<void (*())()> > /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/shared_ptr_base.h:616 (a.out+0x0000000a6ba1)
    #4 __shared_ptr<std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, std::_Bind_simple<void (*())()> > /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/shared_ptr_base.h:1091 (a.out+0x0000000a6a16)
    #5 shared_ptr<std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, std::_Bind_simple<void (*())()> > /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/shared_ptr.h:317 (a.out+0x0000000a6928)
    #6 allocate_shared<std::thread::_Impl<std::_Bind_simple<void (*())()> >, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, std::_Bind_simple<void (*())()> > /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/shared_ptr.h:587 (a.out+0x0000000a6727)
    #7 make_shared<std::thread::_Impl<std::_Bind_simple<void (*())()> >, std::_Bind_simple<void (*())()> > /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/shared_ptr.h:603 (a.out+0x0000000a65cc)
    #8 _M_make_routine<std::_Bind_simple<void (*())()> > /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/thread:193 (a.out+0x0000000a5abb)
    #9 thread<void (&)()> /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/thread:135 (a.out+0x0000000a58ab)
    #10 main /tmp/x.cpp:7 (a.out+0x0000000a5404)

  Thread T1 (tid=24668, running) created by main thread at:
    #0 pthread_create <null>:0 (a.out+0x00000004cf01)
    #1 std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>) <null>:0 (libstdc++.so.6+0x0000000baef8)
    #2 main /tmp/x.cpp:7 (a.out+0x0000000a5404)

SUMMARY: ThreadSanitizer: data race ??:0 operator delete(void*)
==================
ThreadSanitizer: reported 1 warnings

Original issue reported on code.google.com by michij.h...@gmail.com on 28 Jul 2014 at 9:54

GoogleCodeExporter commented 9 years ago
I afraid you need to rebuild the libstdc++ (or libc++) with tsan. 
https://code.google.com/p/thread-sanitizer/wiki/CppManual#FAQ

Original comment by konstant...@gmail.com on 28 Jul 2014 at 9:56

GoogleCodeExporter commented 9 years ago
Thanks for the very fast reply!

OK, I have to admit that I've never built my own std library. Do you know of 
any instructions/how-to I could turn to? Also, do you have any tips for how to 
integrate this into my build env? I'll have to keep the tsan-instrumented 
library aside somewhere and, when building for tsan, adjust the linker flags 
accordingly. Any best-practice for that?

Original comment by michij.h...@gmail.com on 28 Jul 2014 at 9:32

GoogleCodeExporter commented 9 years ago
+knowledgable people for building of libc++
we already do this in chromium, right?

Original comment by dvyu...@google.com on 29 Jul 2014 at 5:16

GoogleCodeExporter commented 9 years ago
Building libc++ is pretty straightforward. You'll have to check out both libc++ 
and libc++abi. Something like this should work:

clang++ -gline-tables-only -O2 -fPIC -std=c++11 -fstrict-aliasing -nostdinc++ 
-nodefaultlibs -lc -lgcc_s -lpthread -lrt -Ilibc++/trunk/include 
-Ilibc++abi/trunk/include -shared -fsanitize=thread -o libc++.so 
libc++/trunk/src/*.cpp libc++abi/trunk/src/*.cpp

This builds the libc++ DSO. To build your code:

clang++ -Ilibc++/trunk/include -Ilibc++abi/trunk/include -nostdinc++ 
-stdlib=libc++ -std=c++11 -fsanitize=thread x.cpp

You may need to provide the path to your libc++ DSO (since it's not installed 
in the system), either through RPATH or LD_LIBRARY_PATH.

Original comment by earth...@google.com on 30 Jul 2014 at 2:21

GoogleCodeExporter commented 9 years ago
Thanks heaps for that info! I got the instrumented version of libc++ built and, 
once I link against that, I no longer get the noise from thread sanitizer for 
my little test program, which is awesome!

But now I'm running into another issue, trying to use the instrumented library 
with my real project. I'm building a library, plus a bunch of executables. 
Everything compiles fine with the include files from trunk. But, when it comes 
to linking an executable, I get lots of undefined symbol errors. These all 
point to libraries that my own library and executables depend on, such as boost 
and various other system libraries.

My makefiles are generated with cmake, and I've not been able to come up with 
the correct magic incantation for getting the build to pick up my instrumented 
library. I don't have libc++ installed and have been linking with gcc's 
libstdc++ so far. So, in a pinch, I could even drop my instrumented version 
into the file system, rather than trying to set tons of build flags to resolve 
this. (Assuming that it's OK to link something like the installed boost with 
the version of libc++ I built from trunk, which may well not be the case; I 
don't know whether trunk is still binary compatible with boost 1.55…)

I'm sorry to be such a nag here, but I'd much appreciate any other pointers you 
might have. I looked at http://libcxx.llvm.org, but I'm not sure how that 
relates to my situation.

Original comment by michij.h...@gmail.com on 4 Aug 2014 at 6:58

GoogleCodeExporter commented 9 years ago
If your system libraries depend on libstdc++, then unfortunately you won't be 
able to use them with libc++. You'll have to rebuild Boost and everything else 
against libc++. I'm afraid I don't have any experience to share here.

Perhaps an easier option would be to suppress the reports coming from libc++. I 
think glider@ used to do this in Chromium. so maybe he can chime in here.

Original comment by earth...@google.com on 4 Aug 2014 at 12:39

GoogleCodeExporter commented 9 years ago
Turns out glider is OOO. I've checked the Chromium suppressions file pre-libc++ 
transition:

http://src.chromium.org/viewvc/chrome/trunk/src/tools/valgrind/tsan_v2/suppressi
ons.txt?pathrev=266010

but couldn't find anything relevant to your case. I will be OOO until the end 
of August as well, so I won't be able to help you. You could probably try to 
write your own suppressions for those reports, or rebuild Boost as suggested 
above.

Original comment by earth...@google.com on 4 Aug 2014 at 2:36

GoogleCodeExporter commented 9 years ago
We did not use std::thread/shared_ptr, so we did not have suppressions for 
them. But we had suppressions for std::string:

50  # See http://crbug.com/181502
51  race:_M_rep
52  race:_M_is_leaked

Original comment by dvyu...@google.com on 4 Aug 2014 at 3:54

GoogleCodeExporter commented 9 years ago
Thanks for your comments everyone!

Hmmm... Rebuilding all my dependencies is not a particularly attractive option. 
It's not only boost, but about half a dozen other libraries, such as Cap'n 
Proto, zmqpp, and so on.

Adding suppressions is something I considered, and actually tried. But there 
are two concerns for me here.

- It's not always simple to decide that particular race is actually a false 
positive, so it's a lot of work.

- I'm worried that adding suppressions may suppress not only the bogus reports 
I'm getting, but may be hiding a real race elsewhere in my code, especially 
when the offending function is somewhere in the bowels of a run-time library 
that is called from many places (both my own code and code in libraries I link 
against). For example, if I suppress _M_rep, can I be sure that, by doing so, 
I'm not going to hide a real issue in my code elsewhere?

From reading the doc, it seems that I can only specify a single function for a 
suppression. Is that correct?

One thing that might help to mitigate this would be suppressions that work 
along the lines of valgrind, where it is possible to specify several stack 
frames, so I can selectively apply suppressions to a specific call site. Is 
this something you are considering for thread sanitizer?

Original comment by michij.h...@gmail.com on 4 Aug 2014 at 10:23

GoogleCodeExporter commented 9 years ago
> - It's not always simple to decide that particular race is actually a false 
positive, so it's a lot of work.

We know that this one is a false positive. How many do you see after 
suppressing this one?

> I'm worried that adding suppressions may suppress not only the bogus reports 
I'm getting

Yes, that is possible.

> One thing that might help to mitigate this would be suppressions that work 
along the lines of valgrind

We moved exactly in the opposite direction: from valgrid suppressions to 
single-line suppressions. I've looked at existing suppressions, and it turned 
out that 99% of them were single-line. It's difficult to write and maintain 
good multi-line suppressions. What multi-line suppresion would you use for this 
report?

I would do something like:
race:__shared_count
or if there are lots of other false positives, then:
called_from_lib:libstdc++.so

Original comment by dvyu...@google.com on 5 Aug 2014 at 6:47

GoogleCodeExporter commented 9 years ago
As far as the std library is concerned, this is the only false positive I'm 
currently aware of. Unfortunately, some of the libraries I'm linking against 
are racy, and I don't get much of a choice about this.

For example, there is a race between malloc/free in a library. I want to 
suppress that, but I don't get much of stack trace:

==================
WARNING: ThreadSanitizer: data race (pid=14746)
  Write of size 8 at 0x7d200000bb80 by thread T7:
    #0 free <null>:0 (ObjectAdapter_test+0x0000000e2dbb)
    #1 <null> <null>:0 (libzmq.so.3+0x00000001aff2)

  Previous write of size 8 at 0x7d200000bb80 by main thread:
    #0 malloc <null>:0 (ObjectAdapter_test+0x0000000e27bd)
    #1 <null> <null>:0 (libzmq.so.3+0x00000001ae70)
    #2 TestBody /home/michi/src/devel/test/gtest/scopes/internal/zmq_middleware/ObjectAdapter/ObjectAdapter_test.cpp:359 (ObjectAdapter_test+0x000000151846)
    #3 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /usr/src/gtest/src/gtest.cc:2078 (ObjectAdapter_test+0x000000222b80)

What I would like to say here is "suppress the malloc/free race in libzmq, but 
not elsewhere, but I'm not sure that's possible right now?

I see similar races for new/delete, with very similar stacks, as well as races 
with malloc/memcpy, etc.

If I just suppress all races for memcpy, I'll easily end up shooting myself in 
the foot by suppressing a real race involving memcpy in my own code.

Original comment by michij.h...@gmail.com on 5 Aug 2014 at 6:59

GoogleCodeExporter commented 9 years ago
You can use called_from_lib suppression to suppress everything in a particular 
dynamic library:
https://code.google.com/p/thread-sanitizer/wiki/Suppressions?ts=1409666982&updat
ed=Suppressions

Original comment by dvyu...@google.com on 2 Sep 2014 at 2:10

GoogleCodeExporter commented 9 years ago
Is there anything else we can help with? If so please reopen the issue.

Original comment by dvyu...@google.com on 2 Sep 2014 at 2:11

GoogleCodeExporter commented 9 years ago
Thanks for adding the library suppression, that is useful!

Longer term, I suspect that multi-line suppressions along the lines of valgrind 
will turn out to be necessary. That's because it's entirely possible that a 
library contains benign races that are false positives (due to use of lock-free 
techniques) but, at the same time, my code may be using the library incorrectly 
elsewhere, causing a real race in the library. For example, zmq sockets are not 
thread-safe, and it is entirely possible for me to write code that is broken 
and causes a real race somewhere inside libzmq. Suppressing everything that 
goes wrong in a library is too coarse because actions taken by the library may 
result in real races as well as false positives, but I only want to suppress 
the false positives.

Would you consider adding more fine-grained control to the suppressions file?

I didn't re-opened the bug. Should this be submitted separately as a feature 
request?

Original comment by michij.h...@gmail.com on 3 Sep 2014 at 10:14

GoogleCodeExporter commented 9 years ago
Just a drive-by comment: TSan should not produce false positives on lock-free 
algorithms, if they use atomic operations correctly. If they don't then these 
algorithms are broken.

Original comment by samso...@google.com on 3 Sep 2014 at 10:27

GoogleCodeExporter commented 9 years ago
>> Would you consider adding more fine-grained control to the suppressions file?
So far we don't want to do this. 

Original comment by konstant...@gmail.com on 4 Sep 2014 at 1:02

GoogleCodeExporter commented 9 years ago
If libzmq is not instrumented, then tsan won't produce any useful reports in 
it. So you just need to ignore it all (add called_from_lib suppression).
If you want to find incorrect usages of libzmq, then you need to build a 
tsan-instrumented version of the library and remove called_from_lib 
suppression. Then tsan will start producing real informative reports that you 
will be able to selectively suppress (e.g. race:zmq_foo_bar).

Original comment by dvyu...@google.com on 4 Sep 2014 at 3:34

GoogleCodeExporter commented 9 years ago
>>>> Would you consider adding more fine-grained control to the suppressions 
file?
>>So far we don't want to do this. 

Assume this means that TSAN does not match on line number, as in following?
race:src/api/src/MXLock.cpp:97 

Original comment by wtor...@gmail.com on 22 Sep 2014 at 5:09

GoogleCodeExporter commented 9 years ago
No, it doesn't. It would be very fragile.
So far it matches function name, file name and module, separately, w/o line 
numbers.

Original comment by dvyu...@google.com on 22 Sep 2014 at 5:18