Open hkaiser opened 1 year ago
Here's the full stacktrace for that thread (now with hpx-1.9.0-rc1):
{stack-trace}: 13 frames:
0x7f06aeeb12bb : /usr/lib/libhpx.so.1(+0x4b12bb) [0x7f06aeeb12bb] in /usr/lib/libhpx.so.1
0x7f06ae7387ec : std::__exception_ptr::exception_ptr hpx::detail::get_exception<hpx::exception>(hpx::exception const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) [0xac] in /usr/lib/libhpx_core.so
0x7f06ae738906 : void hpx::detail::throw_exception<hpx::exception>(hpx::exception const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long) [0x76] in /usr/lib/libhpx_core.so
0x7f06ae73e3c1 : hpx::detail::throw_exception(hpx::error, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long) [0xd1] in /usr/lib/libhpx_core.so
0x7f06ae86a70b : /usr/lib/libhpx_core.so(+0x26a70b) [0x7f06ae86a70b] in /usr/lib/libhpx_core.so
0x7f06ae804c54 : hpx::threads::detail::create_background_thread(hpx::threads::policies::scheduler_base&, unsigned long, hpx::threads::detail::scheduling_callbacks&, std::shared_ptr<bool>&, long&) [0x1a4] in /usr/lib/libhpx_core.so
0x7f06ae86bb0e : /usr/lib/libhpx_core.so(+0x26bb0e) [0x7f06ae86bb0e] in /usr/lib/libhpx_core.so
0x7f06ae86c795 : hpx::threads::detail::scheduled_thread_pool<hpx::threads::policies::shared_priority_queue_scheduler<std::mutex, hpx::threads::policies::concurrentqueue_fifo, hpx::threads::policies::lockfree_lifo> >::thread_func(unsigned long, unsigned long, std::shared_ptr<hpx::util::barrier>) [0x4f5] in /usr/lib/libhpx_core.so
0x7f06ae816695 : std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (hpx::threads::detail::scheduled_thread_pool<hpx::threads::policies::shared_priority_queue_scheduler<std::mutex, hpx::threads::policies::concurrentqueue_fifo, hpx::threads::policies::lockfree_lifo> >::*)(unsigned long, unsigned long, std::shared_ptr<hpx::util::barrier>), hpx::threads::detail::scheduled_thread_pool<hpx::threads::policies::shared_priority_queue_scheduler<std::mutex, hpx::threads::policies::concurrentqueue_fifo, hpx::threads::policies::lockfree_lifo> >*, unsigned long, unsigned long, std::shared_ptr<hpx::util::barrier> > > >::_M_run() [0x55] in /usr/lib/libhpx_core.so
0x7f067cad72c3 : /usr/lib/libstdc++.so.6(+0xd72c3) [0x7f067cad72c3] in /usr/lib/libstdc++.so.6
0x7f067c89ebb5 : /usr/lib/libc.so.6(+0x85bb5) [0x7f067c89ebb5] in /usr/lib/libc.so.6
0x7f067c920d90 : /usr/lib/libc.so.6(+0x107d90) [0x7f067c920d90] in /usr/lib/libc.so.6
{locality-id}: 1
{hostname}: [ (mpi:1) (tcp:127.0.0.1:7911) ]
{process-id}: 68100
{os-thread}: locality#1/worker-thread#5
{thread-description}: <unknown>
{state}: state::pre_main
{auxinfo}:
{file}: /home/beojan/Development/src/hpx/src/hpx-1.9.0-rc1/libs/core/schedulers/include/hpx/schedulers/thread_queue_mc.hpp
{line}: 249
{function}: thread_queue_mc::create_thread
{what}: staged tasks must have 'pending' as their initial state: HPX(bad_parameter)
@beojan I'm not able to reproduce this issue locally. What application did you run?
My demo app is at https://github.com/beojan/HPXDemo.
If I use the Intel mpirun
executable (with the demo linked to OpenMPI) it doesn't crash but this is a clearly faulty setup because of the mismatch between the mpirun
version and the libmpi version.
My Gaudi port understandably crashes during MPI initialization with such a setup.
@beojan would you have more information on how we could reproduce this issue? Are you using any specific environment?
With the demo, I'm running on my laptop (Arch Linux) with HPX 1.9.0-rc1 and OpenMPI 4.1.
You can comment out the TBB and CUDA demos in the CMake file, though you do need oneMKL available to build it.
From IRC: