Open chenhao94 opened 2 years ago
this does not totally surprise me: the dequeue operation from the michael&scott queue copies payload from a node before checking if this payload is valid: but if it reads garbage, the payload is discarded. it would be quite reasonable for tsan to consider this as data race: https://github.com/boostorg/lockfree/blob/develop/include/boost/lockfree/queue.hpp#L438-L443
@timblechmann : I can see how your argument would explain the following TSAN error
Write of size 4 at 0x7b100000ce48 by thread T1:
#0 boost::lockfree::queue<int>::node::node(int const&, boost::lockfree::queue<int>::node*) <null> (lockfree_race+0x4be835)
#1 boost::lockfree::queue<int>::node* boost::lockfree::detail::freelist_stack<boost::lockfree::queue<int>::node, std::allocator<boost::lockfree::queue<int>::node> >::construct<true, false, int, boost::lockfree::queue<int>::node*>(int const&, boost::lockfree::queue<int>::node* const&) <null> (lockfree_race+0x4be5a0)
#2 bool boost::lockfree::queue<int>::do_push<false>(int const&) <null> (lockfree_race+0x4be209)
#3 boost::lockfree::queue<int>::push(int const&) <null> (lockfree_race+0x4be135)
#4 main::$_0::operator()() const <null> (lockfree_race+0x4ba403)
#5 void std::__invoke_impl<void, main::$_0>(std::__invoke_other, main::$_0&&) <null> (lockfree_race+0x4ba36d)
#6 std::__invoke_result<main::$_0>::type std::__invoke<main::$_0>(main::$_0&&) <null> (lockfree_race+0x4ba2bd)
#7 void std::thread::_Invoker<std::tuple<main::$_0> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) <null> (lockfree_race+0x4ba265)
#8 std::thread::_Invoker<std::tuple<main::$_0> >::operator()() <null> (lockfree_race+0x4ba205)
#9 std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::$_0> > >::_M_run() <null> (lockfree_race+0x4ba0f9)
#10 <null> <null> (libstdc++.so.6+0xda6b3)
Previous read of size 4 at 0x7b100000ce48 by thread T3:
#0 void boost::lockfree::detail::copy_convertible::copy<int, int>(int&, int&) <null> (lockfree_race+0x4bd9c5)
#1 void boost::lockfree::detail::copy_payload<int, int>(int&, int&) <null> (lockfree_race+0x4bd845)
#2 bool boost::lockfree::queue<int>::pop<int>(int&) <null> (lockfree_race+0x4c0a18)
#3 boost::lockfree::queue<int>::pop(int&) <null> (lockfree_race+0x4c07c5)
#4 main::$_1::operator()() const <null> (lockfree_race+0x4bb021)
#5 void std::__invoke_impl<void, main::$_1>(std::__invoke_other, main::$_1&&) <null> (lockfree_race+0x4baf9d)
#6 std::__invoke_result<main::$_1>::type std::__invoke<main::$_1>(main::$_1&&) <null> (lockfree_race+0x4baeed)
#7 void std::thread::_Invoker<std::tuple<main::$_1> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) <null> (lockfree_race+0x4bae95)
#8 std::thread::_Invoker<std::tuple<main::$_1> >::operator()() <null> (lockfree_race+0x4bae35)
#9 std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::$_1> > >::_M_run() <null> (lockfree_race+0x4bad29)
#10 <null> <null> (libstdc++.so.6+0xda6b3)
But somehow it looks like there are multiple issues in those TSAN report. Can you elaborate on how that would relate to the following stacktrace (it's the very first initially reported):
Write of size 8 at 0x7b10000087c0 by thread T2:
#0 boost::lockfree::detail::tagged_ptr<boost::lockfree::detail::freelist_stack<boost::lockfree::queue<int>::node, std::allocator<boost::lockfree::queue<int>::node> >::freelist_node>::set_ptr(boost::lockfree::detail::freelist_stack<boost::lockfree::queue<int>::node, std::allocator<boost::lockfree::queue<int>::node> >::freelist_node*) <n
ull> (lockfree_race+0x4bc1b3)
#1 boost::lockfree::detail::freelist_stack<boost::lockfree::queue<int>::node, std::allocator<boost::lockfree::queue<int>::node> >::deallocate_impl(boost::lockfree::queue<int>::node*) <null> (lockfree_race+0x4c0cc6)
#2 void boost::lockfree::detail::freelist_stack<boost::lockfree::queue<int>::node, std::allocator<boost::lockfree::queue<int>::node> >::deallocate<true>(boost::lockfree::queue<int>::node*) <null> (lockfree_race+0x4c0bd5)
#3 void boost::lockfree::detail::freelist_stack<boost::lockfree::queue<int>::node, std::allocator<boost::lockfree::queue<int>::node> >::destruct<true>(boost::lockfree::detail::tagged_ptr<boost::lockfree::queue<int>::node> const&) <null> (lockfree_race+0x4c0b7a)
#4 bool boost::lockfree::queue<int>::pop<int>(int&) <null> (lockfree_race+0x4c0aba)
#5 boost::lockfree::queue<int>::pop(int&) <null> (lockfree_race+0x4c07c5)
#6 main::$_1::operator()() const <null> (lockfree_race+0x4bb021)
#7 void std::__invoke_impl<void, main::$_1>(std::__invoke_other, main::$_1&&) <null> (lockfree_race+0x4baf9d)
#8 std::__invoke_result<main::$_1>::type std::__invoke<main::$_1>(main::$_1&&) <null> (lockfree_race+0x4baeed)
#9 void std::thread::_Invoker<std::tuple<main::$_1> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) <null> (lockfree_race+0x4bae95)
#10 std::thread::_Invoker<std::tuple<main::$_1> >::operator()() <null> (lockfree_race+0x4bae35)
#11 std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::$_1> > >::_M_run() <null> (lockfree_race+0x4bad29)
#12 <null> <null> (libstdc++.so.6+0xda6b3)
Previous atomic read of size 8 at 0x7b10000087c0 by thread T3:
#0 __tsan_atomic64_load <null> (lockfree_race+0x4717ce)
#1 std::atomic<boost::lockfree::detail::tagged_ptr<boost::lockfree::queue<int>::node> >::load(std::memory_order) const <null> (lockfree_race+0x4bd37d)
#2 bool boost::lockfree::queue<int>::pop<int>(int&) <null> (lockfree_race+0x4c089b)
#3 boost::lockfree::queue<int>::pop(int&) <null> (lockfree_race+0x4c07c5)
#4 main::$_1::operator()() const <null> (lockfree_race+0x4bb021)
#5 void std::__invoke_impl<void, main::$_1>(std::__invoke_other, main::$_1&&) <null> (lockfree_race+0x4baf9d)
#6 std::__invoke_result<main::$_1>::type std::__invoke<main::$_1>(main::$_1&&) <null> (lockfree_race+0x4baeed)
#7 void std::thread::_Invoker<std::tuple<main::$_1> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) <null> (lockfree_race+0x4bae95)
#8 std::thread::_Invoker<std::tuple<main::$_1> >::operator()() <null> (lockfree_race+0x4bae35)
#9 std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::$_1> > >::_M_run() <null> (lockfree_race+0x4bad29)
#10 <null> <null> (libstdc++.so.6+0xda6b3)
To my untrained eye it looks like an atomic and non-atomic operation colliding on the same address. Moreover it looks like we are not dealing with payload here but with the free-list pointers? I came across those issues myself and I definitely had some random data corruption (most likely the same item popped twice). It would be great to really rule out that there is no issue here.
Cheers,
Eike
EDIT: Does your argument imply that queue::pop(U& x)
might modify x
even if the operation fails (i.e. the function returns false
). If so, how about stating that in the documentation?
yes: queue::pop
might modify the value if the dequeue operation fails. good point to add a note to the docs. it could be done without, but would add more requirements to the type (trivial ctor) and it would result in an additional memcpy
as for tsan, there are two points:
struct { tagged_ptr next; T payload;};
). however the tagged_ptr next
is also used to enqueue the node onto the freelist.queue::pop
performs multiple reads of the "head" pointer to detect this situation and externally visible effects to the data structures are only committed via compare-and-swapJust opened a PR to add a statement to the docs: #81
If there are multiple consumers, there will be tsan data race warnings. Here is the code to reproduce:
Environment: