thread sanitizer fires in disk_io_job fence logic

arvidn commented 6 years ago

Please provide the following information

libtorrent version (or branch): RC_1_1 (with https://github.com/arvidn/libtorrent/pull/3081 https://github.com/arvidn/libtorrent/pull/3082 https://github.com/arvidn/libtorrent/pull/3083 applied)

platform/architecture: MacOS/x86_64

compiler and compiler version: clang++ (Apple LLVM version 9.1.0 (clang-902.0.39.2))

When running the test_checking.cpp.discrete_checking test under thread sanitizer (built with sanitize=thread b2 option) the following race condition is detected.

WARNING: ThreadSanitizer: data race (pid=43533)
  Write of size 1 at 0x7b8400024590 by thread T1 (mutexes: write M2185):
    #0 libtorrent::disk_job_fence::job_complete(libtorrent::disk_io_job*, libtorrent::tailqueue<libtorrent::disk_io_job>&) storage.cpp:1930
    #1 libtorrent::disk_io_thread::add_completed_jobs_impl(libtorrent::tailqueue<libtorrent::disk_io_job>&, libtorrent::tailqueue<libtorrent::disk_io_job>&) disk_io_thread.cpp:3424
    #2 libtorrent::disk_io_thread::add_completed_jobs(libtorrent::tailqueue<libtorrent::disk_io_job>&) disk_io_thread.cpp:3397
    #3 libtorrent::disk_io_thread::execute_job(libtorrent::disk_io_job*) disk_io_thread.cpp:3233
    #4 libtorrent::disk_io_thread::thread_fun(int, libtorrent::disk_io_thread::thread_type_t, boost::shared_ptr<boost::asio::io_context::work>) disk_io_thread.cpp:3283 

  Previous read of size 1 at 0x7b8400024590 by thread T15:
    #0 libtorrent::disk_io_thread::add_fence_job(libtorrent::piece_manager*, libtorrent::disk_io_job*, bool) disk_io_thread.cpp:3107
    #1 libtorrent::disk_io_thread::async_set_file_priority(libtorrent::piece_manager*, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > const&, boost::function<void (libtorrent::disk_io_job const*)> const&) disk_io_thread.cpp:2040 
    #2 libtorrent::torrent::prioritize_files(std::__1::vector<int, std::__1::allocator<int> > const&) torrent.cpp:5714
    #3 void boost::_mfi::mf1<void, libtorrent::torrent, std::__1::vector<int, std::__1::allocator<int> > const&>::call<boost::shared_ptr<libtorrent::torrent>, std::__1::vector<int, std::__1::allocator<int> > const>(boost::shared_ptr<libtorrent::torrent>&, void const*, std::__1::vector<int, std::__1::allocator<int> > const&) const mem_fn_template.hpp:156
    #4 void boost::_mfi::mf1<void, libtorrent::torrent, std::__1::vector<int, std::__1::allocator<int> > const&>::operator()<boost::shared_ptr<libtorrent::torrent> >(boost::shared_ptr<libtorrent::torrent>&, std::__1::vector<int, std::__1::allocator<int> > const&) const mem_fn_template.hpp:171
    #5 void boost::_bi::list2<boost::_bi::value<boost::shared_ptr<libtorrent::torrent> >, boost::_bi::value<std::__1::vector<int, std::__1::allocator<int> > > >::operator()<boost::_mfi::mf1<void, libtorrent::torrent, std::__1::vector<int, std::__1::allocator<int> > const&>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf1<void, libtorrent::torrent, std::__1::vector<int, std::__1::allocator<int> > const&>&, boost::_bi::list0&, int) bind.hpp:319

  Location is heap block of size 4368 at 0x7b8400024400 allocated by thread T15:
    #0 operator new[](unsigned long, std::nothrow_t const&) <null>:2131792
    #1 boost::default_user_allocator_new_delete::malloc(unsigned long) pool.hpp:97
    #2 boost::pool<boost::default_user_allocator_new_delete>::malloc_need_resize() pool.hpp:693
    #3 boost::pool<boost::default_user_allocator_new_delete>::malloc() pool.hpp:431
    #4 libtorrent::disk_job_pool::allocate_job(int) disk_job_pool.cpp:54
    #5 libtorrent::disk_io_thread::async_check_fastresume(libtorrent::piece_manager*, libtorrent::bdecode_node const*, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >&, boost::function<void (libtorrent::disk_io_job const*)> const&) disk_io_thread.cpp:1884
    #6 libtorrent::torrent::init() torrent.cpp:2172 (test_checking:x86_64+0x1007331ea)
    #7 libtorrent::torrent::start(libtorrent::add_torrent_params const&) torrent.cpp:889
    #8 libtorrent::aux::session_impl::add_torrent(libtorrent::add_torrent_params const&, boost::system::error_code&) session_impl.cpp:4798

  Mutex M2185 (0x7b3c000080c8) created at:
    #0 pthread_mutex_init <null>:2131648 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x29383)
    #1 boost::asio::detail::posix_mutex::posix_mutex() posix_mutex.ipp:34
    #2 boost::asio::detail::posix_mutex::posix_mutex() posix_mutex.ipp:33
    #3 libtorrent::disk_job_fence::disk_job_fence() storage.cpp:1873
    #4 libtorrent::piece_manager::piece_manager(libtorrent::storage_interface*, boost::shared_ptr<void> const&, libtorrent::file_storage*) storage.cpp:1773
    #5 libtorrent::piece_manager::piece_manager(libtorrent::storage_interface*, boost::shared_ptr<void> const&, libtorrent::file_storage*) storage.cpp:1780
    #6 boost::detail::sp_if_not_array<libtorrent::piece_manager>::type boost::make_shared<libtorrent::piece_manager, libtorrent::storage_interface*&, boost::shared_ptr<libtorrent::torrent>, libtorrent::file_storage*>(libtorrent::storage_interface*&&&, boost::shared_ptr<libtorrent::torrent>&&, libtorrent::file_storage*&&) make_shared_object.hpp:256
    #7 libtorrent::torrent::construct_storage() torrent.cpp:1864 (test_checking:x86_64+0x100756169)
    #8 libtorrent::torrent::init() torrent.cpp:1969
    #9 libtorrent::torrent::start(libtorrent::add_torrent_params const&) torrent.cpp:889
    #10 libtorrent::aux::session_impl::add_torrent(libtorrent::add_torrent_params const&, boost::system::error_code&) session_impl.cpp:4798
    #11 boost::_mfi::mf2<libtorrent::torrent_handle, libtorrent::aux::session_impl, libtorrent::add_torrent_params const&, boost::system::error_code&>::operator()(libtorrent::aux::session_impl*, libtorrent::add_torrent_params const&, boost::system::error_code&) const mem_fn_template.hpp:280

  Thread T1 (tid=11834964, running) created by thread T15 at:
    #0 pthread_create <null>:2131840 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x283ed)
    #1 boost::asio::detail::posix_thread::start_thread(boost::asio::detail::posix_thread::func_base*)
[...]
    #4 libtorrent::disk_io_thread::set_num_threads(int, bool) disk_io_thread.cpp:227
    #5 libtorrent::aux::session_impl::update_disk_threads() session_impl.cpp:6349
    #6 libtorrent::run_all_updates(libtorrent::aux::session_impl&) settings_pack.cpp:486
    #7 libtorrent::aux::session_impl::init() session_impl.cpp:603

SUMMARY: ThreadSanitizer: data race storage.cpp:1930 in libtorrent::disk_job_fence::job_complete(libtorrent::disk_io_job*, libtorrent::tailqueue<libtorrent::disk_io_job>&)

As far as I can tell, it's referring to the write to disk_io_job::flags here, which was previously written to from another thread here.

I'm having a hard time understanding how this could be a race. The ownership of the job object is transferred from the main thread into the disk thread (pool). This transfer is synchronised with a mutex around the job queue. This issue seems to specifically call out the fence jobs though. The fence jobs are queued up as a blocked job instead of on the normal job queue, inside the disk_job_fence object. Once all normal jobs have been drained, it's issued. But it should still belong to thread pool at that point.

@ssiloti or @aldenml do you have any suggestions or hints I can go look at?

arvidn commented 6 years ago

What I'm not 100% convinced of is that the model thread sanitizer uses supports:

create an object in thread 1
pass a pointer to it via a thread safe queue to thread 2
have thread 2 mutate it

Sometimes I get the impression it expects all (shared) accesses to memory be made under a mutex lock.

ssiloti commented 6 years ago

It makes sense that thread sanitizer can't see the synchronization between the two threads. It looks like the sanitizer supports annotations specifically to inform it about these sorts of "hidden" barriers.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

arvidn / libtorrent

thread sanitizer fires in disk_io_job fence logic #3086