Open Bidski opened 3 weeks ago
Hi,
The discovery mechanism should not take longer than 1 second. When it hangs, please try to attach gdb to the process and get a backtrace:
thread apply all bt full
in gdbThe threads have been sitting in this state for approximately 15 hours
#0 0x00007f9cb7b0b30d in syscall () from /usr/lib/libc.so.6
No symbol table info available.
#1 0x00007f9cbde42c4c in ?? () from /usr/local/lib/libglib-2.0.so.0
No symbol table info available.
#2 0x00007f9cbe0dc386 in arv_get_n_devices () from /usr/local/lib/libaravis-0.8.so.0
No symbol table info available.
#3 0x00007f9cbf27c5df in module::input::configure_camera(extension::Configuration const&, module::input::Camera&)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#1}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const [clone .isra.0] () from lib/libinputCamera.so
No symbol table info available.
#4 0x00007f9cbf27cdc9 in module::input::configure_camera(extension::Configuration const&, module::input::Camera&) () from lib/libinputCamera.so
No symbol table info available.
#5 0x00007f9cbf28006f in module::input::reset_camera(module::input::CameraContext&) () from lib/libinputCamera.so
No symbol table info available.
#6 0x00007f9cbf28053a in std::_Function_handler<void (NUClear::threading::Task<NUClear::threading::Reaction>&), NUClear::util::CallbackGenerator<NUClear::dsl::Parse<NUClear::dsl::word::Watchdog<module::input::Camera, 1, std::chrono::duration<long, std::ratio<1l, 1l> > >, NUClear::dsl::word::Single>, module::input::Camera::Parse(std::unique_ptr<NUClear::Environment, std::default_delete<NUClear::Environment> >)::{lambda(extension::Configuration const&)#2}::operator()(extension::Configuration const&) const::{lambda()#1}>::operator()(NUClear::threading::Reaction&)::{lambda(NUClear::threading::Task<NUClear::threading::Reaction>&)#1}>::_M_invoke(std::_Any_data const&, NUClear::threading::Task<NUClear::threading::Reaction>&) () from lib/libinputCamera.so
No symbol table info available.
#7 0x00007f9cbfb4758f in std::_Function_handler<void (), NUClear::PowerPlant::submit(std::unique_ptr<NUClear::threading::Task<NUClear::threading::Reaction>, std::default_delete<NUClear::threading::Task<NUClear::threading::Reaction> > >&&, bool const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#8 0x00007f9cbfb4c69c in NUClear::threading::TaskScheduler::run_task(NUClear::threading::TaskScheduler::Task&&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#9 0x00007f9cbfb4eefb in NUClear::threading::TaskScheduler::pool_func(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#10 0x00007f9cbfb4f4f5 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (NUClear::threading::TaskScheduler::*)(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>), NUClear::threading::TaskScheduler*, std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue> > > >::_M_run() () from lib/libsupportSignalCatcher.so
No symbol table info available.
#11 0x00007f9cb7ed6183 in std::execute_native_thread_routine (__p=0x55e1fce66b60) at /usr/src/debug/gcc/libstdc++-v3/src/c++11/thread.cc:82
__t = <optimized out>
#12 0x00007f9cb7a8c54d in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#13 0x00007f9cb7b11874 in clone () from /usr/lib/libc.so.6
No symbol table info available.
Thread 9 (Thread 0x7f9c819f6640 (LWP 3941) "data_recording"):
#0 0x00007f9cb7b0b30d in syscall () from /usr/lib/libc.so.6
No symbol table info available.
#1 0x00007f9cbde42c4c in ?? () from /usr/local/lib/libglib-2.0.so.0
No symbol table info available.
#2 0x00007f9cbe0dc386 in arv_get_n_devices () from /usr/local/lib/libaravis-0.8.so.0
No symbol table info available.
#3 0x00007f9cbf27c5df in module::input::configure_camera(extension::Configuration const&, module::input::Camera&)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#1}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const [clone .isra.0] () from lib/libinputCamera.so
No symbol table info available.
#4 0x00007f9cbf27cdc9 in module::input::configure_camera(extension::Configuration const&, module::input::Camera&) () from lib/libinputCamera.so
No symbol table info available.
#5 0x00007f9cbf28006f in module::input::reset_camera(module::input::CameraContext&) () from lib/libinputCamera.so
No symbol table info available.
#6 0x00007f9cbf28053a in std::_Function_handler<void (NUClear::threading::Task<NUClear::threading::Reaction>&), NUClear::util::CallbackGenerator<NUClear::dsl::Parse<NUClear::dsl::word::Watchdog<module::input::Camera, 1, std::chrono::duration<long, std::ratio<1l, 1l> > >, NUClear::dsl::word::Single>, module::input::Camera::Parse(std::unique_ptr<NUClear::Environment, std::default_delete<NUClear::Environment> >)::{lambda(extension::Configuration const&)#2}::operator()(extension::Configuration const&) const::{lambda()#1}>::operator()(NUClear::threading::Reaction&)::{lambda(NUClear::threading::Task<NUClear::threading::Reaction>&)#1}>::_M_invoke(std::_Any_data const&, NUClear::threading::Task<NUClear::threading::Reaction>&) () from lib/libinputCamera.so
No symbol table info available.
#7 0x00007f9cbfb4758f in std::_Function_handler<void (), NUClear::PowerPlant::submit(std::unique_ptr<NUClear::threading::Task<NUClear::threading::Reaction>, std::default_delete<NUClear::threading::Task<NUClear::threading::Reaction> > >&&, bool const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#8 0x00007f9cbfb4c69c in NUClear::threading::TaskScheduler::run_task(NUClear::threading::TaskScheduler::Task&&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
--Type <RET> for more, q to quit, c to continue without paging--c
#9 0x00007f9cbfb4eefb in NUClear::threading::TaskScheduler::pool_func(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#10 0x00007f9cbfb4f4f5 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (NUClear::threading::TaskScheduler::*)(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>), NUClear::threading::TaskScheduler*, std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue> > > >::_M_run() () from lib/libsupportSignalCatcher.so
No symbol table info available.
#11 0x00007f9cb7ed6183 in std::execute_native_thread_routine (__p=0x55e1fce662a0) at /usr/src/debug/gcc/libstdc++-v3/src/c++11/thread.cc:82
__t = <optimized out>
#12 0x00007f9cb7a8c54d in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#13 0x00007f9cb7b11874 in clone () from /usr/lib/libc.so.6
No symbol table info available.
Thread 8 (Thread 0x7f9c821f7640 (LWP 3940) "data_recording"):
#0 0x00007f9cb7b0b30d in syscall () from /usr/lib/libc.so.6
No symbol table info available.
#1 0x00007f9cbde42c4c in ?? () from /usr/local/lib/libglib-2.0.so.0
No symbol table info available.
#2 0x00007f9cbe0dc386 in arv_get_n_devices () from /usr/local/lib/libaravis-0.8.so.0
No symbol table info available.
#3 0x00007f9cbf27c5df in module::input::configure_camera(extension::Configuration const&, module::input::Camera&)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#1}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const [clone .isra.0] () from lib/libinputCamera.so
No symbol table info available.
#4 0x00007f9cbf27cdc9 in module::input::configure_camera(extension::Configuration const&, module::input::Camera&) () from lib/libinputCamera.so
No symbol table info available.
#5 0x00007f9cbf28006f in module::input::reset_camera(module::input::CameraContext&) () from lib/libinputCamera.so
No symbol table info available.
#6 0x00007f9cbf28053a in std::_Function_handler<void (NUClear::threading::Task<NUClear::threading::Reaction>&), NUClear::util::CallbackGenerator<NUClear::dsl::Parse<NUClear::dsl::word::Watchdog<module::input::Camera, 1, std::chrono::duration<long, std::ratio<1l, 1l> > >, NUClear::dsl::word::Single>, module::input::Camera::Parse(std::unique_ptr<NUClear::Environment, std::default_delete<NUClear::Environment> >)::{lambda(extension::Configuration const&)#2}::operator()(extension::Configuration const&) const::{lambda()#1}>::operator()(NUClear::threading::Reaction&)::{lambda(NUClear::threading::Task<NUClear::threading::Reaction>&)#1}>::_M_invoke(std::_Any_data const&, NUClear::threading::Task<NUClear::threading::Reaction>&) () from lib/libinputCamera.so
No symbol table info available.
#7 0x00007f9cbfb4758f in std::_Function_handler<void (), NUClear::PowerPlant::submit(std::unique_ptr<NUClear::threading::Task<NUClear::threading::Reaction>, std::default_delete<NUClear::threading::Task<NUClear::threading::Reaction> > >&&, bool const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#8 0x00007f9cbfb4c69c in NUClear::threading::TaskScheduler::run_task(NUClear::threading::TaskScheduler::Task&&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#9 0x00007f9cbfb4eefb in NUClear::threading::TaskScheduler::pool_func(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#10 0x00007f9cbfb4f4f5 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (NUClear::threading::TaskScheduler::*)(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>), NUClear::threading::TaskScheduler*, std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue> > > >::_M_run() () from lib/libsupportSignalCatcher.so
No symbol table info available.
#11 0x00007f9cb7ed6183 in std::execute_native_thread_routine (__p=0x55e1fce906f0) at /usr/src/debug/gcc/libstdc++-v3/src/c++11/thread.cc:82
__t = <optimized out>
#12 0x00007f9cb7a8c54d in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#13 0x00007f9cb7b11874 in clone () from /usr/lib/libc.so.6
No symbol table info available.
Thread 7 (Thread 0x7f9c841fb640 (LWP 3936) "data_recording"):
#0 0x00007f9cb7b0b30d in syscall () from /usr/lib/libc.so.6
No symbol table info available.
#1 0x00007f9cbde42c4c in ?? () from /usr/local/lib/libglib-2.0.so.0
No symbol table info available.
#2 0x00007f9cbe0dc386 in arv_get_n_devices () from /usr/local/lib/libaravis-0.8.so.0
No symbol table info available.
#3 0x00007f9cbf27c5df in module::input::configure_camera(extension::Configuration const&, module::input::Camera&)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#1}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const [clone .isra.0] () from lib/libinputCamera.so
No symbol table info available.
#4 0x00007f9cbf27cdc9 in module::input::configure_camera(extension::Configuration const&, module::input::Camera&) () from lib/libinputCamera.so
No symbol table info available.
#5 0x00007f9cbf28006f in module::input::reset_camera(module::input::CameraContext&) () from lib/libinputCamera.so
No symbol table info available.
#6 0x00007f9cbf28053a in std::_Function_handler<void (NUClear::threading::Task<NUClear::threading::Reaction>&), NUClear::util::CallbackGenerator<NUClear::dsl::Parse<NUClear::dsl::word::Watchdog<module::input::Camera, 1, std::chrono::duration<long, std::ratio<1l, 1l> > >, NUClear::dsl::word::Single>, module::input::Camera::Parse(std::unique_ptr<NUClear::Environment, std::default_delete<NUClear::Environment> >)::{lambda(extension::Configuration const&)#2}::operator()(extension::Configuration const&) const::{lambda()#1}>::operator()(NUClear::threading::Reaction&)::{lambda(NUClear::threading::Task<NUClear::threading::Reaction>&)#1}>::_M_invoke(std::_Any_data const&, NUClear::threading::Task<NUClear::threading::Reaction>&) () from lib/libinputCamera.so
No symbol table info available.
#7 0x00007f9cbfb4758f in std::_Function_handler<void (), NUClear::PowerPlant::submit(std::unique_ptr<NUClear::threading::Task<NUClear::threading::Reaction>, std::default_delete<NUClear::threading::Task<NUClear::threading::Reaction> > >&&, bool const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#8 0x00007f9cbfb4c69c in NUClear::threading::TaskScheduler::run_task(NUClear::threading::TaskScheduler::Task&&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#9 0x00007f9cbfb4eefb in NUClear::threading::TaskScheduler::pool_func(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#10 0x00007f9cbfb4f4f5 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (NUClear::threading::TaskScheduler::*)(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>), NUClear::threading::TaskScheduler*, std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue> > > >::_M_run() () from lib/libsupportSignalCatcher.so
No symbol table info available.
#11 0x00007f9cb7ed6183 in std::execute_native_thread_routine (__p=0x55e1fce3da00) at /usr/src/debug/gcc/libstdc++-v3/src/c++11/thread.cc:82
__t = <optimized out>
#12 0x00007f9cb7a8c54d in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#13 0x00007f9cb7b11874 in clone () from /usr/lib/libc.so.6
No symbol table info available.
Thread 6 (Thread 0x7f9c849fc640 (LWP 3935) "data_recording"):
#0 0x00007f9cb7b0b30d in syscall () from /usr/lib/libc.so.6
No symbol table info available.
#1 0x00007f9cbde42c4c in ?? () from /usr/local/lib/libglib-2.0.so.0
No symbol table info available.
#2 0x00007f9cbe0dc386 in arv_get_n_devices () from /usr/local/lib/libaravis-0.8.so.0
No symbol table info available.
#3 0x00007f9cbf27c5df in module::input::configure_camera(extension::Configuration const&, module::input::Camera&)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#1}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const [clone .isra.0] () from lib/libinputCamera.so
No symbol table info available.
#4 0x00007f9cbf27cdc9 in module::input::configure_camera(extension::Configuration const&, module::input::Camera&) () from lib/libinputCamera.so
No symbol table info available.
#5 0x00007f9cbf28006f in module::input::reset_camera(module::input::CameraContext&) () from lib/libinputCamera.so
No symbol table info available.
#6 0x00007f9cbf28053a in std::_Function_handler<void (NUClear::threading::Task<NUClear::threading::Reaction>&), NUClear::util::CallbackGenerator<NUClear::dsl::Parse<NUClear::dsl::word::Watchdog<module::input::Camera, 1, std::chrono::duration<long, std::ratio<1l, 1l> > >, NUClear::dsl::word::Single>, module::input::Camera::Parse(std::unique_ptr<NUClear::Environment, std::default_delete<NUClear::Environment> >)::{lambda(extension::Configuration const&)#2}::operator()(extension::Configuration const&) const::{lambda()#1}>::operator()(NUClear::threading::Reaction&)::{lambda(NUClear::threading::Task<NUClear::threading::Reaction>&)#1}>::_M_invoke(std::_Any_data const&, NUClear::threading::Task<NUClear::threading::Reaction>&) () from lib/libinputCamera.so
No symbol table info available.
#7 0x00007f9cbfb4758f in std::_Function_handler<void (), NUClear::PowerPlant::submit(std::unique_ptr<NUClear::threading::Task<NUClear::threading::Reaction>, std::default_delete<NUClear::threading::Task<NUClear::threading::Reaction> > >&&, bool const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#8 0x00007f9cbfb4c69c in NUClear::threading::TaskScheduler::run_task(NUClear::threading::TaskScheduler::Task&&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#9 0x00007f9cbfb4eefb in NUClear::threading::TaskScheduler::pool_func(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#10 0x00007f9cbfb4f4f5 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (NUClear::threading::TaskScheduler::*)(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>), NUClear::threading::TaskScheduler*, std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue> > > >::_M_run() () from lib/libsupportSignalCatcher.so
No symbol table info available.
#11 0x00007f9cb7ed6183 in std::execute_native_thread_routine (__p=0x55e1fce3d7c0) at /usr/src/debug/gcc/libstdc++-v3/src/c++11/thread.cc:82
__t = <optimized out>
#12 0x00007f9cb7a8c54d in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#13 0x00007f9cb7b11874 in clone () from /usr/lib/libc.so.6
No symbol table info available.
Thread 4 (Thread 0x7f9c859fe640 (LWP 3933) "data_recording"):
#0 0x00007f9cb7b0b30d in syscall () from /usr/lib/libc.so.6
No symbol table info available.
#1 0x00007f9cbde42c4c in ?? () from /usr/local/lib/libglib-2.0.so.0
No symbol table info available.
#2 0x00007f9cbe0dca30 in arv_shutdown () from /usr/local/lib/libaravis-0.8.so.0
No symbol table info available.
#3 0x00007f9cbf27c46e in std::_Function_handler<void (NUClear::threading::Task<NUClear::threading::Reaction>&), NUClear::util::CallbackGenerator<NUClear::dsl::Parse<NUClear::dsl::word::Shutdown>, module::input::Camera::Camera(std::unique_ptr<NUClear::Environment, std::default_delete<NUClear::Environment> >)::{lambda()#4}>::operator()(NUClear::threading::Reaction&)::{lambda(NUClear::threading::Task<NUClear::threading::Reaction>&)#1}>::_M_invoke(std::_Any_data const&, NUClear::threading::Task<NUClear::threading::Reaction>&) () from lib/libinputCamera.so
No symbol table info available.
#4 0x00007f9cbfb4758f in std::_Function_handler<void (), NUClear::PowerPlant::submit(std::unique_ptr<NUClear::threading::Task<NUClear::threading::Reaction>, std::default_delete<NUClear::threading::Task<NUClear::threading::Reaction> > >&&, bool const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#5 0x00007f9cbfb4c69c in NUClear::threading::TaskScheduler::run_task(NUClear::threading::TaskScheduler::Task&&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#6 0x00007f9cbfb4eefb in NUClear::threading::TaskScheduler::pool_func(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#7 0x00007f9cbfb4f4f5 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (NUClear::threading::TaskScheduler::*)(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>), NUClear::threading::TaskScheduler*, std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue> > > >::_M_run() () from lib/libsupportSignalCatcher.so
No symbol table info available.
#8 0x00007f9cb7ed6183 in std::execute_native_thread_routine (__p=0x55e1fc980e10) at /usr/src/debug/gcc/libstdc++-v3/src/c++11/thread.cc:82
__t = <optimized out>
#9 0x00007f9cb7a8c54d in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#10 0x00007f9cb7b11874 in clone () from /usr/lib/libc.so.6
No symbol table info available.
It looks like all the threads are waiting trying to lock arv_system_mutex, but I'm not sure. Could try to capture the backtrace with the debug symbols enabled. The backtrace should report the source line calling each function.
Under certain networking conditions a call to either
arv_get_n_devices
orarv_get_device_serial_nbr
may hang indefinitely. I am not sure which one is actually hanging, but I suspect it isarv_get_n_devices
.I have 5 cameras in my system, all of them are GigEVision cameras. My rough setup is
It is unclear to me the exact conditions/timing that causes this to happen as I am still in the process of debugging this, but the root cause seems may be a hardware issue with either the switch or the cables, however we feel that our software should be resilient to these sorts of hardware faults.
Is there any sort of timeout mechanism currently built into either of these functions?
Platform description: