RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.34k stars 1.26k forks source link

Data race in double_pendulum_demo_test #6634

Closed m-chaturvedi closed 6 years ago

m-chaturvedi commented 7 years ago

The failure: https://drake-jenkins.csail.mit.edu/view/Nightly%20Production/job/linux-xenial-clang-bazel-nightly-memcheck-tsan/15/consoleText

We have temporarily removed the test: https://github.com/RobotLocomotion/drake/pull/6632

m-chaturvedi commented 6 years ago

Output, the link is obsolete

jwnimmer-tri commented 6 years ago

Obviously the cdash text is lone gone by now. Here's a recent example:

==================== Test output for //examples/double_pendulum:double_pendulum_demo_test:
==================
WARNING: ThreadSanitizer: data race (pid=19)
  Read of size 4 at 0x7b4400000114 by main thread:
    #0 lcm_udpm_publish /proc/self/cwd/external/lcm/lcm/lcm_udpm.c:601 (libdrake_lcm.so+0x10791)
    #1 lcm_publish /proc/self/cwd/external/lcm/lcm/lcm.c:249 (libdrake_lcm.so+0x67ea)
    #2 lcm::LCM::publish(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void const*, unsigned int) /proc/self/cwd/external/lcm/lcm/lcm-cpp-impl.hpp:156 (double_pendulum_demo_test+0x964968)
    #3 drake::lcm::DrakeLcm::Publish(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void const*, int, stx::optional<double>) /proc/self/cwd/lcm/drake_lcm.cc:53 (double_pendulum_demo_test+0x96416f)
    #4 void drake::lcm::Publish<drake::lcmt_viewer_load_robot>(drake::lcm::DrakeLcmInterface*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, drake::lcmt_viewer_load_robot const&, stx::optional<double>) /proc/self/cwd/bazel-out/k8-dbg/bin/lcm/_virtual_includes/interface/drake/lcm/drake_lcm_interface.h:97 (double_pendulum_demo_test+0x94f788)
    #5 drake::systems::DrakeVisualizer::PublishLoadRobot() const /proc/self/cwd/attic/multibody/rigid_body_plant/drake_visualizer.cc:153 (double_pendulum_demo_test+0x94e55c)
    #6 drake::systems::DrakeVisualizer::DoPublish(drake::systems::Context<double> const&, std::vector<drake::systems::PublishEvent<double> const*, std::allocator<drake::systems::PublishEvent<double> const*> > const&) const /proc/self/cwd/attic/multibody/rigid_body_plant/drake_visualizer.cc:128 (double_pendulum_demo_test+0x94e258)
    #7 drake::systems::LeafSystem<double>::DispatchPublishHandler(drake::systems::Context<double> const&, drake::systems::EventCollection<drake::systems::PublishEvent<double> > const&) const /proc/self/cwd/bazel-out/k8-dbg/bin/systems/framework/_virtual_includes/leaf_system/drake/systems/framework/leaf_system.h:1667 (double_pendulum_demo_test+0x72166c)
    #8 drake::systems::System<double>::Publish(drake::systems::Context<double> const&, drake::systems::EventCollection<drake::systems::PublishEvent<double> > const&) const /proc/self/cwd/bazel-out/k8-dbg/bin/systems/framework/_virtual_includes/system/drake/systems/framework/system.h:352 (double_pendulum_demo_test+0x653090)
    #9 drake::systems::Diagram<double>::DispatchPublishHandler(drake::systems::Context<double> const&, drake::systems::EventCollection<drake::systems::PublishEvent<double> > const&) const /proc/self/cwd/bazel-out/k8-dbg/bin/systems/framework/_virtual_includes/diagram/drake/systems/framework/diagram.h:1106 (double_pendulum_demo_test+0x51249f)
    #10 drake::systems::System<double>::Publish(drake::systems::Context<double> const&, drake::systems::EventCollection<drake::systems::PublishEvent<double> > const&) const /proc/self/cwd/bazel-out/k8-dbg/bin/systems/framework/_virtual_includes/system/drake/systems/framework/system.h:352 (double_pendulum_demo_test+0x653090)
    #11 drake::systems::Simulator<double>::HandlePublish(drake::systems::EventCollection<drake::systems::PublishEvent<double> > const&) /proc/self/cwd/bazel-out/k8-dbg/bin/systems/analysis/_virtual_includes/simulator/drake/systems/analysis/simulator.h:558 (double_pendulum_demo_test+0x672481)
    #12 drake::systems::Simulator<double>::Initialize() /proc/self/cwd/bazel-out/k8-dbg/bin/systems/analysis/_virtual_includes/simulator/drake/systems/analysis/simulator.h:498 (double_pendulum_demo_test+0x4e0458)
    #13 drake::examples::double_pendulum::(anonymous namespace)::main(int, char**) /proc/self/cwd/examples/double_pendulum/double_pendulum_demo.cc:63 (double_pendulum_demo_test+0x4ddfb0)
    #14 main /proc/self/cwd/examples/double_pendulum/double_pendulum_demo.cc:76 (double_pendulum_demo_test+0x4ddd3e)

  Previous write of size 4 at 0x7b4400000114 by thread T1:
    #0 lcm_udpm_publish /proc/self/cwd/external/lcm/lcm/lcm_udpm.c:627 (libdrake_lcm.so+0x108d6)
    #1 udpm_self_test /proc/self/cwd/external/lcm/lcm/lcm_udpm.c:794 (libdrake_lcm.so+0x11a97)
    #2 _setup_recv_parts /proc/self/cwd/external/lcm/lcm/lcm_udpm.c:1017 (libdrake_lcm.so+0x116b5)
    #3 lcm_udpm_get_fileno /proc/self/cwd/external/lcm/lcm/lcm_udpm.c:573 (libdrake_lcm.so+0x10f09)
    #4 lcm_get_fileno /proc/self/cwd/external/lcm/lcm/lcm.c:241 (libdrake_lcm.so+0x674d)
    #5 lcm::LCM::getFileno() /proc/self/cwd/external/lcm/lcm/lcm-cpp-impl.hpp:195 (double_pendulum_demo_test+0x9692a6)
    #6 drake::lcm::(anonymous namespace)::WaitForLcm(lcm::LCM*, double) /proc/self/cwd/lcm/lcm_receive_thread.cc:29 (double_pendulum_demo_test+0x968b75)
    #7 drake::lcm::LcmReceiveThread::LoopWithSelect() /proc/self/cwd/lcm/lcm_receive_thread.cc:54 (double_pendulum_demo_test+0x968a2f)
    #8 void std::_Mem_fn_base<void (drake::lcm::LcmReceiveThread::*)(), true>::operator()<, void>(drake::lcm::LcmReceiveThread*) const /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/functional:600 (double_pendulum_demo_test+0x96aa09)
    #9 void std::_Bind_simple<std::_Mem_fn<void (drake::lcm::LcmReceiveThread::*)()> (drake::lcm::LcmReceiveThread*)>::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/functional:1530 (double_pendulum_demo_test+0x96a95b)
    #10 std::_Bind_simple<std::_Mem_fn<void (drake::lcm::LcmReceiveThread::*)()> (drake::lcm::LcmReceiveThread*)>::operator()() /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/functional:1520 (double_pendulum_demo_test+0x96a8e9)
    #11 std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (drake::lcm::LcmReceiveThread::*)()> (drake::lcm::LcmReceiveThread*)> >::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/thread:115 (double_pendulum_demo_test+0x96a52d)
    #12 std::this_thread::__sleep_for(std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) ??:? (libstdc++.so.6+0xb8c7f)

  Location is heap block of size 280 at 0x7b4400000000 allocated by main thread:
    #0 calloc ??:? (double_pendulum_demo_test+0x473dbc)
    #1 lcm_udpm_create /proc/self/cwd/external/lcm/lcm/lcm_udpm.c:1056 (libdrake_lcm.so+0xfbdc)
    #2 lcm_create /proc/self/cwd/external/lcm/lcm/lcm.c:122 (libdrake_lcm.so+0x5ca9)
    #3 LCM /proc/self/cwd/external/lcm/lcm/lcm-cpp-impl.hpp:127 (double_pendulum_demo_test+0x964531)
    #4 DrakeLcm /proc/self/cwd/lcm/drake_lcm.cc:28 (double_pendulum_demo_test+0x963e4e)
    #5 DrakeLcm /proc/self/cwd/lcm/drake_lcm.cc:24 (double_pendulum_demo_test+0x963d8a)
    #6 std::_MakeUniq<drake::lcm::DrakeLcm>::__single_object std::make_unique<drake::lcm::DrakeLcm>() /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/unique_ptr.h:765 (double_pendulum_demo_test+0x4dfbb7)
    #7 drake::examples::double_pendulum::(anonymous namespace)::main(int, char**) /proc/self/cwd/examples/double_pendulum/double_pendulum_demo.cc:41 (double_pendulum_demo_test+0x4dde1d)
    #8 main /proc/self/cwd/examples/double_pendulum/double_pendulum_demo.cc:76 (double_pendulum_demo_test+0x4ddd3e)

  Thread T1 (tid=21, running) created by main thread at:
    #0 pthread_create ??:? (double_pendulum_demo_test+0x44b0fb)
    #1 std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) ??:? (libstdc++.so.6+0xb8dc2)
    #2 LcmReceiveThread /proc/self/cwd/lcm/lcm_receive_thread.cc:15 (double_pendulum_demo_test+0x96898d)
    #3 std::_MakeUniq<drake::lcm::LcmReceiveThread>::__single_object std::make_unique<drake::lcm::LcmReceiveThread, lcm::LCM*>(lcm::LCM*&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/unique_ptr.h:765 (double_pendulum_demo_test+0x964834)
    #4 drake::lcm::DrakeLcm::StartReceiveThread() /proc/self/cwd/lcm/drake_lcm.cc:34 (double_pendulum_demo_test+0x963fd7)
    #5 drake::examples::double_pendulum::(anonymous namespace)::main(int, char**) /proc/self/cwd/examples/double_pendulum/double_pendulum_demo.cc:42 (double_pendulum_demo_test+0x4dde31)
    #6 main /proc/self/cwd/examples/double_pendulum/double_pendulum_demo.cc:76 (double_pendulum_demo_test+0x4ddd3e)

SUMMARY: ThreadSanitizer: data race /proc/self/cwd/external/lcm/lcm/lcm_udpm.c:601 in lcm_udpm_publish
==================
ThreadSanitizer: reported 1 warnings
================================================================================
jwnimmer-tri commented 6 years ago

The problem is that lcm's transmit sequence number does not use atomic locks. So when we StartReceiveThread, it does the LCM self-test (which transmits), then in Simulator::Initialize, we publish LoadRobot, which also transmits (and increments the transmit sequence number). Since the same number was incremented by two threads, without any locking, it's a potential data race hazard.

jwnimmer-tri commented 6 years ago

Actually, both of those lines are guarded by the same g_static_mutex_lock, so it looks like lsan is just confused about how glib locks work, maybe?