Closed GoogleCodeExporter closed 9 years ago
FWIW I never seem to get this using the addr2line symbolizer (though the
addr2line one often hangs, per my bug filed at
http://llvm.org/bugs/show_bug.cgi?id=18379)
Original comment by tlip...@gmail.com
on 25 Mar 2014 at 1:11
Attaching a repro test case. With the external symbolizer this segfaults
reliably on my machine. (requires gtest and boost, but could just as easily do
without)
Original comment by tlip...@gmail.com
on 25 Mar 2014 at 1:29
Attachments:
Seems like this is actually the same root cause as issue #45 -- the mutex bugs
aren't reported using the normal ReportRace() call and those don't take
CommonSanitizerReportMutex.Lock. This means you can't suppress them and also
that multiple bugs may be reported concurrently and screw up the state of the
symbolizer.
Original comment by tlip...@gmail.com
on 25 Mar 2014 at 1:42
Original comment by konstant...@gmail.com
on 25 Mar 2014 at 6:54
Thanks for the report and the suggestion in #3!
It' looks like a likely cause of the crash.
Meanwhile, if you fix the "mutex unlock by another thread", does it fix the
crash?
Original comment by dvyu...@google.com
on 25 Mar 2014 at 1:53
Haven't tried that, since actually fixing the problem is somewhat difficult in
our application. Our rwlock implementation is actually based around CAS
spinning, so we know it is actually safe to be used in this unconventional
manner. So, if what you're asking is whether the crash is caused by TSAN or by
the code itself, I'm certain it's a TSAN issue. (we run the same tests
constantly with ASAN and never had a problem).
I do have a partial fix for our application which reduces the number of errors
reported by TSAN, and I can confirm that with that fix, the TSAN runs fail
significantly less often.
Original comment by tlip...@gmail.com
on 25 Mar 2014 at 7:56
>Our rwlock implementation is actually based around CAS spinning, so we know it
is actually safe to be used in this unconventional manner.
Why does tsan think that this is a mutex? Have you manually annotated it as
mutex using tsan annotations? If so, it's incorrect. Since it's a semaphore. If
you are using atomic operations than tsan must automatically understand the
synchronization and you don't need annotations.
Original comment by dvyu...@google.com
on 25 Mar 2014 at 7:58
> Why does tsan think that this is a mutex? Have you manually annotated it as
mutex using tsan annotations? If so, it's incorrect. Since it's a semaphore. If
you are using atomic operations than tsan must automatically understand the
synchronization and you don't need annotations
That's a fair point. In some parts of the code it's used as a mutex, and in
other parts of the code it's used as a semaphore (since we know it's
implementation is safe to be used that way). Given that, I had annotated it as
a RWLOCK. Perhaps it's worth duplicating the code and only annotating one of
the two implementations, per your point.
Nevertheless, this is still a TSAN bug, right? :)
Original comment by tlip...@gmail.com
on 25 Mar 2014 at 8:01
Yes, there are definitely something to fix in tsan.
I am just trying to find a workaround, so that you can continue using tsan.
Original comment by dvyu...@google.com
on 25 Mar 2014 at 8:05
I attempted to duplicate the implementation and only annotate one as an RWLOCK.
This gets rid of the mutex related warnings, but now I get data races.
Any advice for debugging? Setting force_seq_cst_atomic=1 got rid of the
warning, but that implies I've got something going wrong. After looking at the
barriers for an hour or so and even going so far as adding superfluous
MemoryBarrier() instructions, I've still come up empty.
I've attached my implementation (simplified a bit) and the test which generates
a TSAN error:
WARNING: ThreadSanitizer: data race (pid=1971)
Write of size 4 at 0x7fffe1dada60 by thread T11:
#0 Writer(SharedState*) /home/todd/git/kudu/src/util/rw_semaphore-test.cc:24 (exe+0x0000000a00ca)
#1 void boost::_bi::list1<boost::_bi::value<SharedState*> >::operator()<void (*)(SharedState*), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(SharedState*), boost::_bi::list0&, int) /usr/include/boost/bind/bind.hpp:253 (exe+0x0000000a40d7)
#2 boost::_bi::bind_t<void, void (*)(SharedState*), boost::_bi::list1<boost::_bi::value<SharedState*> > >::operator()() /usr/include/boost/bind/bind_template.hpp:20 (exe+0x0000000a4061)
#3 boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(SharedState*), boost::_bi::list1<boost::_bi::value<SharedState*> > > >::run() /usr/include/boost/thread/detail/thread.hpp:117 (exe+0x0000000a3290)
#4 boost::this_thread::interruption_requested() ??:0 (libboost_thread.so.1.54.0+0x00000000ba49)
Previous read of size 4 at 0x7fffe1dada60 by thread T8:
#0 Reader(SharedState*) /home/todd/git/kudu/src/util/rw_semaphore-test.cc:34 (exe+0x0000000a0144)
#1 void boost::_bi::list1<boost::_bi::value<SharedState*> >::operator()<void (*)(SharedState*), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(SharedState*), boost::_bi::list0&, int) /usr/include/boost/bind/bind.hpp:253 (exe+0x0000000a40d7)
#2 boost::_bi::bind_t<void, void (*)(SharedState*), boost::_bi::list1<boost::_bi::value<SharedState*> > >::operator()() /usr/include/boost/bind/bind_template.hpp:20 (exe+0x0000000a4061)
#3 boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(SharedState*), boost::_bi::list1<boost::_bi::value<SharedState*> > > >::run() /usr/include/boost/thread/detail/thread.hpp:117 (exe+0x0000000a3290)
#4 boost::this_thread::interruption_requested() ??:0 (libboost_thread.so.1.54.0+0x00000000ba49)
I don't necessarily expect you to debug my code ;-) But this might make a good
example for where manual annotations are needed beyond just using atomic ops,
if indeed I'm not making a mistake here.
Original comment by tlip...@gmail.com
on 26 Mar 2014 at 2:18
Attachments:
This must be Acquire_Load:
void WaitPendingReaders() {
int loop_count = 0;
while ((base::subtle::NoBarrier_Load(&state_) & kNumReadersMask) > 0) {
boost::detail::yield(loop_count++);
}
}
because this function effectively denotes enter into the critical section, and
it must synchronize with the previous readers.
With the current code, writers synchronize only with past readers, but not with
readers that are currently in critical sections.
Consequently, lock() can use NoBarrier_CompareAndSwap.
Original comment by dvyu...@google.com
on 26 Mar 2014 at 7:42
Aha, that was it. Thanks, Dmitry! Appreciate the help.
Original comment by tlip...@gmail.com
on 26 Mar 2014 at 5:08
http://llvm.org/viewvc/llvm-project?view=revision&revision=207204
fixes "double lock of a mutex" report.
Original comment by dvyu...@google.com
on 25 Apr 2014 at 7:50
http://llvm.org/viewvc/llvm-project?view=revision&revision=207206
fixes "mutex unlock by another thread" report.
Original comment by dvyu...@google.com
on 25 Apr 2014 at 8:02
Other relevant commits: 207207, 207208, 207209, 207211.
I think I've fixed them all.
Original comment by dvyu...@google.com
on 25 Apr 2014 at 9:23
Original issue reported on code.google.com by
tlip...@gmail.com
on 25 Mar 2014 at 1:10