Closed brycelelbach closed 10 years ago
Occassionally, if I use a performance counter with an interval, I get a similar hang at application shutdown.
This actually affects all applications as the race condition is occasionally hit for the bootstrap barriers:
Thread 9 (Thread 0x2aaab12f4700 (LWP 4720)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00002aaaabee9563 in hpx::util::io_service_pool::thread_run(unsigned long) () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#2 0x00002aaaaacc40fa in thread_proxy () from /opt/boost/1.53.0-release/stage/lib/libboost_thread.so.1.53.0
#3 0x00002aaaadf88e0e in start_thread (arg=0x2aaab12f4700) at pthread_create.c:311
#4 0x00002aaaae28593d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 8 (Thread 0x2aaab14f5700 (LWP 4721)):
#0 0x00002aaaae285f23 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1 0x00002aaaabd1e64b in boost::asio::detail::epoll_reactor::run(bool, boost::asio::detail::op_queue<boost::asio::detail::task_io_service_operation>&) () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#2 0x00002aaaabee9324 in hpx::util::io_service_pool::thread_run(unsigned long) () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#3 0x00002aaaaacc40fa in thread_proxy () from /opt/boost/1.53.0-release/stage/lib/libboost_thread.so.1.53.0
#4 0x00002aaaadf88e0e in start_thread (arg=0x2aaab14f5700) at pthread_create.c:311
#5 0x00002aaaae28593d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 7 (Thread 0x2aaab16f6700 (LWP 4722)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00002aaaabee9563 in hpx::util::io_service_pool::thread_run(unsigned long) () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#2 0x00002aaaaacc40fa in thread_proxy () from /opt/boost/1.53.0-release/stage/lib/libboost_thread.so.1.53.0
#3 0x00002aaaadf88e0e in start_thread (arg=0x2aaab16f6700) at pthread_create.c:311
#4 0x00002aaaae28593d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 6 (Thread 0x2aaab18f7700 (LWP 4723)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00002aaaabee9563 in hpx::util::io_service_pool::thread_run(unsigned long) () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#2 0x00002aaaaacc40fa in thread_proxy () from /opt/boost/1.53.0-release/stage/lib/libboost_thread.so.1.53.0
#3 0x00002aaaadf88e0e in start_thread (arg=0x2aaab18f7700) at pthread_create.c:311
#4 0x00002aaaae28593d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 5 (Thread 0x2aaab1af8700 (LWP 4724)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00002aaaabee9563 in hpx::util::io_service_pool::thread_run(unsigned long) () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#2 0x00002aaaaacc40fa in thread_proxy () from /opt/boost/1.53.0-release/stage/lib/libboost_thread.so.1.53.0
#3 0x00002aaaadf88e0e in start_thread (arg=0x2aaab1af8700) at pthread_create.c:311
#4 0x00002aaaae28593d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 4 (Thread 0x2aaab1cf9700 (LWP 4725)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00002aaaabee9563 in hpx::util::io_service_pool::thread_run(unsigned long) () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#2 0x00002aaaaacc40fa in thread_proxy () from /opt/boost/1.53.0-release/stage/lib/libboost_thread.so.1.53.0
#3 0x00002aaaadf88e0e in start_thread (arg=0x2aaab1cf9700) at pthread_create.c:311
#4 0x00002aaaae28593d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 3 (Thread 0x2aaab1efa700 (LWP 4726)):
#0 0x00002aaaadf8facd in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00002aaaabbc14e3 in hpx::util::spinlock_pool<hpx::naming::gid_type::tag>::scoped_lock::lock() () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#2 0x00002aaaabbefe2d in hpx::lcos::future<bool> hpx::agas::stubs::symbol_namespace::service_async<bool>(hpx::naming::id_type const&, hpx::agas::request const&, hpx::threads::thread_priority) () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#3 0x00002aaaabc23047 in hpx::agas::addressing_service::register_name_async(std::string const&, hpx::naming::id_type const&) () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#4 0x00002aaaabc09e77 in hpx::agas::register_name_sync(std::string const&, hpx::naming::id_type const&, hpx::error_code&) () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#5 0x00002aaaabb2e94b in hpx::create_barrier(unsigned long, char const*) () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#6 0x00002aaaabb21068 in hpx::pre_main(hpx::runtime_mode) () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#7 0x00002aaaabacef22 in hpx::runtime_impl<hpx::threads::policies::local_priority_queue_scheduler<boost::mutex>, hpx::threads::policies::callback_notifier>::run_helper(hpx::util::function_nonser<int ()>, int&) () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#8 0x00002aaaabaa3343 in hpx::util::detail::vtable<false>::type<boost::_bi::bind_t<hpx::threads::detail::tagged_thread_state<hpx::threads::thread_state_enum>, boost::_mfi::mf2<hpx::threads::detail::tagged_thread_state<hpx::threads::thread_state_enum>, hpx::runtime_impl<hpx::threads::policies::local_priority_queue_scheduler<boost::mutex>, hpx::threads::policies::callback_notifier>, hpx::util::function_nonser<int ()>, int&>, boost::_bi::list3<boost::_bi::value<hpx::runtime_impl<hpx::threads::policies::local_priority_queue_scheduler<boost::mutex>, hpx::threads::policies::callback_notifier>*>, boost::_bi::value<hpx::util::function_nonser<int ()> >, boost::reference_wrapper<int> > >, hpx::threads::thread_state_enum (hpx::threads::thread_state_ex_enum), void, void>::invoke(void**, hpx::threads::thread_state_ex_enum&&) () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#9 0x00002aaaabaec469 in hpx::util::coroutines::detail::coroutine_impl_wrapper<hpx::util::function_nonser<hpx::threads::thread_state_enum (hpx::threads::thread_state_ex_enum)>, hpx::util::coroutines::coroutine<hpx::threads::thread_state_enum (hpx::threads::thread_state_ex_enum), hpx::threads::detail::coroutine_allocator, hpx::util::coroutines::detail::lx::x86_linux_context_impl>, hpx::util::coroutines::detail::lx::x86_linux_context_impl, hpx::threads::detail::coroutine_allocator>::operator()() () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#10 0x00002aaaabaa2589 in void hpx::util::coroutines::detail::lx::trampoline<hpx::util::coroutines::detail::coroutine_impl_wrapper<hpx::util::function_nonser<hpx::threads::thread_state_enum (hpx::threads::thread_state_ex_enum)>, hpx::util::coroutines::coroutine<hpx::threads::thread_state_enum (hpx::threads::thread_state_ex_enum), hpx::threads::detail::coroutine_allocator, hpx::util::coroutines::detail::lx::x86_linux_context_impl>, hpx::util::coroutines::detail::lx::x86_linux_context_impl, hpx::threads::detail::coroutine_allocator> >(hpx::util::coroutines::detail::coroutine_impl_wrapper<hpx::util::function_nonser<hpx::threads::thread_state_enum (hpx::threads::thread_state_ex_enum)>, hpx::util::coroutines::coroutine<hpx::threads::thread_state_enum (hpx::threads::thread_state_ex_enum), hpx::threads::detail::coroutine_allocator, hpx::util::coroutines::detail::lx::x86_linux_context_impl>, hpx::util::coroutines::detail::lx::x86_linux_context_impl, hpx::threads::detail::coroutine_allocator>*) () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#11 0x0000000000000000 in ?? ()
Thread 2 (Thread 0x2aaab22fc700 (LWP 4727)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00002aaaabca6185 in void boost::condition_variable_any::wait<boost::unique_lock<boost::mutex> >(boost::unique_lock<boost::mutex>&) [clone .constprop.3950] () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#2 0x00002aaaabcaa1ea in hpx::components::server::runtime_support::wait() () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#3 0x00002aaaabab075f in hpx::runtime_impl<hpx::threads::policies::local_priority_queue_scheduler<boost::mutex>, hpx::threads::policies::callback_notifier>::wait_helper(boost::mutex&, boost::condition_variable_any&, bool&) () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#4 0x00002aaaaacc40fa in thread_proxy () from /opt/boost/1.53.0-release/stage/lib/libboost_thread.so.1.53.0
#5 0x00002aaaadf88e0e in start_thread (arg=0x2aaab22fc700) at pthread_create.c:311
#6 0x00002aaaae28593d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 1 (Thread 0x2aaaaacf0b40 (LWP 4715)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00002aaaabee9563 in hpx::util::io_service_pool::thread_run(unsigned long) () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#2 0x00002aaaabacdc20 in hpx::runtime_impl<hpx::threads::policies::local_priority_queue_scheduler<boost::mutex>, hpx::threads::policies::callback_notifier>::wait() () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#3 0x00002aaaabaa7755 in hpx::runtime_impl<hpx::threads::policies::local_priority_queue_scheduler<boost::mutex>, hpx::threads::policies::callback_notifier>::run(hpx::util::function_nonser<int ()> const&) () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#4 0x00002aaaabafacec in hpx::detail::run(hpx::runtime&, hpx::util::function_nonser<int (boost::program_options::variables_map&)> const&, boost::program_options::variables_map&, hpx::runtime_mode, hpx::util::function_nonser<void ()> const&, hpx::util::function_nonser<void ()> const&) () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#5 0x00002aaaabafbd70 in hpx::detail::run_priority_local(hpx::util::function_nonser<void ()> const&, hpx::util::function_nonser<void ()> const&, hpx::util::command_line_handling&, bool) () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#6 0x00002aaaabafde6b in hpx::run_or_start(hpx::util::function_nonser<int (boost::program_options::variables_map&)> const&, boost::program_options::options_description const&, int, char**, std::vector<std::string, std::allocator<std::string> > const&, hpx::util::function_nonser<void ()> const&, hpx::util::function_nonser<void ()> const&, hpx::runtime_mode, bool) () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#7 0x00002aaaabafe3a2 in hpx::init(hpx::util::function_nonser<int (boost::program_options::variables_map&)> const&, boost::program_options::options_description const&, int, char**, std::vector<std::string, std::allocator<std::string> > const&, hpx::util::function_nonser<void ()> const&, hpx::util::function_nonser<void ()> const&, hpx::runtime_mode) () from /home/wash/install/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
#8 0x00002aaaaaad5d9f in main ()
Please enable lock tracking and report back whether this triggers some held lock during suspension.
Please try again with the latest commit. I am still seeing occasional hangs on shutdown. Investigating.
What's the current status of this?
I was not able to reproduce this specific hang. Just recently ran a job on 256 cores with around 1024 performance counter enabled. Please reopen if the problem is still there.
Running Fibonacci on ariel00 with the below options and the below git hash leads to a hang:
A stack trace from all threads reveals that all the scheduler threads are looking for work, except for one thread, which sits in addressing_space::register_name_async attempting to acquire a spinlock. It consistently remains in spinlock acquisition.
The problem does not show up without the performance counter.