AravisProject / aravis

A vision library for genicam based cameras
GNU Lesser General Public License v2.1
839 stars 314 forks source link

Device enumeration hangs #905

Open Bidski opened 3 weeks ago

Bidski commented 3 weeks ago

Under certain networking conditions a call to either arv_get_n_devices or arv_get_device_serial_nbr may hang indefinitely. I am not sure which one is actually hanging, but I suspect it is arv_get_n_devices.

I have 5 cameras in my system, all of them are GigEVision cameras. My rough setup is

                                     /-> 1Gbps link -> GigEVision camera
                                    /-> 1Gbps link -> GigEVision camera
PC -> 10Gbps SFP link -> PoE Switch -> 1Gbps link -> GigEVision camera
                                    \-> 1Gbps link -> GigEVision camera
                                     \-> 1Gbps link -> GigEVision camera

It is unclear to me the exact conditions/timing that causes this to happen as I am still in the process of debugging this, but the root cause seems may be a hardware issue with either the switch or the cables, however we feel that our software should be resilient to these sorts of hardware faults.

Is there any sort of timeout mechanism currently built into either of these functions?

Platform description:

EmmanuelP commented 3 weeks ago

Hi,

The discovery mechanism should not take longer than 1 second. When it hangs, please try to attach gdb to the process and get a backtrace:

Bidski commented 3 weeks ago

The threads have been sitting in this state for approximately 15 hours

#0  0x00007f9cb7b0b30d in syscall () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x00007f9cbde42c4c in ?? () from /usr/local/lib/libglib-2.0.so.0
No symbol table info available.
#2  0x00007f9cbe0dc386 in arv_get_n_devices () from /usr/local/lib/libaravis-0.8.so.0
No symbol table info available.
#3  0x00007f9cbf27c5df in module::input::configure_camera(extension::Configuration const&, module::input::Camera&)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#1}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const [clone .isra.0] () from lib/libinputCamera.so
No symbol table info available.
#4  0x00007f9cbf27cdc9 in module::input::configure_camera(extension::Configuration const&, module::input::Camera&) () from lib/libinputCamera.so
No symbol table info available.
#5  0x00007f9cbf28006f in module::input::reset_camera(module::input::CameraContext&) () from lib/libinputCamera.so
No symbol table info available.
#6  0x00007f9cbf28053a in std::_Function_handler<void (NUClear::threading::Task<NUClear::threading::Reaction>&), NUClear::util::CallbackGenerator<NUClear::dsl::Parse<NUClear::dsl::word::Watchdog<module::input::Camera, 1, std::chrono::duration<long, std::ratio<1l, 1l> > >, NUClear::dsl::word::Single>, module::input::Camera::Parse(std::unique_ptr<NUClear::Environment, std::default_delete<NUClear::Environment> >)::{lambda(extension::Configuration const&)#2}::operator()(extension::Configuration const&) const::{lambda()#1}>::operator()(NUClear::threading::Reaction&)::{lambda(NUClear::threading::Task<NUClear::threading::Reaction>&)#1}>::_M_invoke(std::_Any_data const&, NUClear::threading::Task<NUClear::threading::Reaction>&) () from lib/libinputCamera.so
No symbol table info available.
#7  0x00007f9cbfb4758f in std::_Function_handler<void (), NUClear::PowerPlant::submit(std::unique_ptr<NUClear::threading::Task<NUClear::threading::Reaction>, std::default_delete<NUClear::threading::Task<NUClear::threading::Reaction> > >&&, bool const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#8  0x00007f9cbfb4c69c in NUClear::threading::TaskScheduler::run_task(NUClear::threading::TaskScheduler::Task&&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#9  0x00007f9cbfb4eefb in NUClear::threading::TaskScheduler::pool_func(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#10 0x00007f9cbfb4f4f5 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (NUClear::threading::TaskScheduler::*)(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>), NUClear::threading::TaskScheduler*, std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue> > > >::_M_run() () from lib/libsupportSignalCatcher.so
No symbol table info available.
#11 0x00007f9cb7ed6183 in std::execute_native_thread_routine (__p=0x55e1fce66b60) at /usr/src/debug/gcc/libstdc++-v3/src/c++11/thread.cc:82
        __t = <optimized out>
#12 0x00007f9cb7a8c54d in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#13 0x00007f9cb7b11874 in clone () from /usr/lib/libc.so.6
No symbol table info available.

Thread 9 (Thread 0x7f9c819f6640 (LWP 3941) "data_recording"):
#0  0x00007f9cb7b0b30d in syscall () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x00007f9cbde42c4c in ?? () from /usr/local/lib/libglib-2.0.so.0
No symbol table info available.
#2  0x00007f9cbe0dc386 in arv_get_n_devices () from /usr/local/lib/libaravis-0.8.so.0
No symbol table info available.
#3  0x00007f9cbf27c5df in module::input::configure_camera(extension::Configuration const&, module::input::Camera&)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#1}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const [clone .isra.0] () from lib/libinputCamera.so
No symbol table info available.
#4  0x00007f9cbf27cdc9 in module::input::configure_camera(extension::Configuration const&, module::input::Camera&) () from lib/libinputCamera.so
No symbol table info available.
#5  0x00007f9cbf28006f in module::input::reset_camera(module::input::CameraContext&) () from lib/libinputCamera.so
No symbol table info available.
#6  0x00007f9cbf28053a in std::_Function_handler<void (NUClear::threading::Task<NUClear::threading::Reaction>&), NUClear::util::CallbackGenerator<NUClear::dsl::Parse<NUClear::dsl::word::Watchdog<module::input::Camera, 1, std::chrono::duration<long, std::ratio<1l, 1l> > >, NUClear::dsl::word::Single>, module::input::Camera::Parse(std::unique_ptr<NUClear::Environment, std::default_delete<NUClear::Environment> >)::{lambda(extension::Configuration const&)#2}::operator()(extension::Configuration const&) const::{lambda()#1}>::operator()(NUClear::threading::Reaction&)::{lambda(NUClear::threading::Task<NUClear::threading::Reaction>&)#1}>::_M_invoke(std::_Any_data const&, NUClear::threading::Task<NUClear::threading::Reaction>&) () from lib/libinputCamera.so
No symbol table info available.
#7  0x00007f9cbfb4758f in std::_Function_handler<void (), NUClear::PowerPlant::submit(std::unique_ptr<NUClear::threading::Task<NUClear::threading::Reaction>, std::default_delete<NUClear::threading::Task<NUClear::threading::Reaction> > >&&, bool const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#8  0x00007f9cbfb4c69c in NUClear::threading::TaskScheduler::run_task(NUClear::threading::TaskScheduler::Task&&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
--Type <RET> for more, q to quit, c to continue without paging--c
#9  0x00007f9cbfb4eefb in NUClear::threading::TaskScheduler::pool_func(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#10 0x00007f9cbfb4f4f5 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (NUClear::threading::TaskScheduler::*)(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>), NUClear::threading::TaskScheduler*, std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue> > > >::_M_run() () from lib/libsupportSignalCatcher.so
No symbol table info available.
#11 0x00007f9cb7ed6183 in std::execute_native_thread_routine (__p=0x55e1fce662a0) at /usr/src/debug/gcc/libstdc++-v3/src/c++11/thread.cc:82
        __t = <optimized out>
#12 0x00007f9cb7a8c54d in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#13 0x00007f9cb7b11874 in clone () from /usr/lib/libc.so.6
No symbol table info available.

Thread 8 (Thread 0x7f9c821f7640 (LWP 3940) "data_recording"):
#0  0x00007f9cb7b0b30d in syscall () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x00007f9cbde42c4c in ?? () from /usr/local/lib/libglib-2.0.so.0
No symbol table info available.
#2  0x00007f9cbe0dc386 in arv_get_n_devices () from /usr/local/lib/libaravis-0.8.so.0
No symbol table info available.
#3  0x00007f9cbf27c5df in module::input::configure_camera(extension::Configuration const&, module::input::Camera&)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#1}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const [clone .isra.0] () from lib/libinputCamera.so
No symbol table info available.
#4  0x00007f9cbf27cdc9 in module::input::configure_camera(extension::Configuration const&, module::input::Camera&) () from lib/libinputCamera.so
No symbol table info available.
#5  0x00007f9cbf28006f in module::input::reset_camera(module::input::CameraContext&) () from lib/libinputCamera.so
No symbol table info available.
#6  0x00007f9cbf28053a in std::_Function_handler<void (NUClear::threading::Task<NUClear::threading::Reaction>&), NUClear::util::CallbackGenerator<NUClear::dsl::Parse<NUClear::dsl::word::Watchdog<module::input::Camera, 1, std::chrono::duration<long, std::ratio<1l, 1l> > >, NUClear::dsl::word::Single>, module::input::Camera::Parse(std::unique_ptr<NUClear::Environment, std::default_delete<NUClear::Environment> >)::{lambda(extension::Configuration const&)#2}::operator()(extension::Configuration const&) const::{lambda()#1}>::operator()(NUClear::threading::Reaction&)::{lambda(NUClear::threading::Task<NUClear::threading::Reaction>&)#1}>::_M_invoke(std::_Any_data const&, NUClear::threading::Task<NUClear::threading::Reaction>&) () from lib/libinputCamera.so
No symbol table info available.
#7  0x00007f9cbfb4758f in std::_Function_handler<void (), NUClear::PowerPlant::submit(std::unique_ptr<NUClear::threading::Task<NUClear::threading::Reaction>, std::default_delete<NUClear::threading::Task<NUClear::threading::Reaction> > >&&, bool const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#8  0x00007f9cbfb4c69c in NUClear::threading::TaskScheduler::run_task(NUClear::threading::TaskScheduler::Task&&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#9  0x00007f9cbfb4eefb in NUClear::threading::TaskScheduler::pool_func(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#10 0x00007f9cbfb4f4f5 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (NUClear::threading::TaskScheduler::*)(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>), NUClear::threading::TaskScheduler*, std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue> > > >::_M_run() () from lib/libsupportSignalCatcher.so
No symbol table info available.
#11 0x00007f9cb7ed6183 in std::execute_native_thread_routine (__p=0x55e1fce906f0) at /usr/src/debug/gcc/libstdc++-v3/src/c++11/thread.cc:82
        __t = <optimized out>
#12 0x00007f9cb7a8c54d in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#13 0x00007f9cb7b11874 in clone () from /usr/lib/libc.so.6
No symbol table info available.

Thread 7 (Thread 0x7f9c841fb640 (LWP 3936) "data_recording"):
#0  0x00007f9cb7b0b30d in syscall () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x00007f9cbde42c4c in ?? () from /usr/local/lib/libglib-2.0.so.0
No symbol table info available.
#2  0x00007f9cbe0dc386 in arv_get_n_devices () from /usr/local/lib/libaravis-0.8.so.0
No symbol table info available.
#3  0x00007f9cbf27c5df in module::input::configure_camera(extension::Configuration const&, module::input::Camera&)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#1}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const [clone .isra.0] () from lib/libinputCamera.so
No symbol table info available.
#4  0x00007f9cbf27cdc9 in module::input::configure_camera(extension::Configuration const&, module::input::Camera&) () from lib/libinputCamera.so
No symbol table info available.
#5  0x00007f9cbf28006f in module::input::reset_camera(module::input::CameraContext&) () from lib/libinputCamera.so
No symbol table info available.
#6  0x00007f9cbf28053a in std::_Function_handler<void (NUClear::threading::Task<NUClear::threading::Reaction>&), NUClear::util::CallbackGenerator<NUClear::dsl::Parse<NUClear::dsl::word::Watchdog<module::input::Camera, 1, std::chrono::duration<long, std::ratio<1l, 1l> > >, NUClear::dsl::word::Single>, module::input::Camera::Parse(std::unique_ptr<NUClear::Environment, std::default_delete<NUClear::Environment> >)::{lambda(extension::Configuration const&)#2}::operator()(extension::Configuration const&) const::{lambda()#1}>::operator()(NUClear::threading::Reaction&)::{lambda(NUClear::threading::Task<NUClear::threading::Reaction>&)#1}>::_M_invoke(std::_Any_data const&, NUClear::threading::Task<NUClear::threading::Reaction>&) () from lib/libinputCamera.so
No symbol table info available.
#7  0x00007f9cbfb4758f in std::_Function_handler<void (), NUClear::PowerPlant::submit(std::unique_ptr<NUClear::threading::Task<NUClear::threading::Reaction>, std::default_delete<NUClear::threading::Task<NUClear::threading::Reaction> > >&&, bool const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#8  0x00007f9cbfb4c69c in NUClear::threading::TaskScheduler::run_task(NUClear::threading::TaskScheduler::Task&&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#9  0x00007f9cbfb4eefb in NUClear::threading::TaskScheduler::pool_func(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#10 0x00007f9cbfb4f4f5 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (NUClear::threading::TaskScheduler::*)(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>), NUClear::threading::TaskScheduler*, std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue> > > >::_M_run() () from lib/libsupportSignalCatcher.so
No symbol table info available.
#11 0x00007f9cb7ed6183 in std::execute_native_thread_routine (__p=0x55e1fce3da00) at /usr/src/debug/gcc/libstdc++-v3/src/c++11/thread.cc:82
        __t = <optimized out>
#12 0x00007f9cb7a8c54d in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#13 0x00007f9cb7b11874 in clone () from /usr/lib/libc.so.6
No symbol table info available.

Thread 6 (Thread 0x7f9c849fc640 (LWP 3935) "data_recording"):
#0  0x00007f9cb7b0b30d in syscall () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x00007f9cbde42c4c in ?? () from /usr/local/lib/libglib-2.0.so.0
No symbol table info available.
#2  0x00007f9cbe0dc386 in arv_get_n_devices () from /usr/local/lib/libaravis-0.8.so.0
No symbol table info available.
#3  0x00007f9cbf27c5df in module::input::configure_camera(extension::Configuration const&, module::input::Camera&)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#1}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const [clone .isra.0] () from lib/libinputCamera.so
No symbol table info available.
#4  0x00007f9cbf27cdc9 in module::input::configure_camera(extension::Configuration const&, module::input::Camera&) () from lib/libinputCamera.so
No symbol table info available.
#5  0x00007f9cbf28006f in module::input::reset_camera(module::input::CameraContext&) () from lib/libinputCamera.so
No symbol table info available.
#6  0x00007f9cbf28053a in std::_Function_handler<void (NUClear::threading::Task<NUClear::threading::Reaction>&), NUClear::util::CallbackGenerator<NUClear::dsl::Parse<NUClear::dsl::word::Watchdog<module::input::Camera, 1, std::chrono::duration<long, std::ratio<1l, 1l> > >, NUClear::dsl::word::Single>, module::input::Camera::Parse(std::unique_ptr<NUClear::Environment, std::default_delete<NUClear::Environment> >)::{lambda(extension::Configuration const&)#2}::operator()(extension::Configuration const&) const::{lambda()#1}>::operator()(NUClear::threading::Reaction&)::{lambda(NUClear::threading::Task<NUClear::threading::Reaction>&)#1}>::_M_invoke(std::_Any_data const&, NUClear::threading::Task<NUClear::threading::Reaction>&) () from lib/libinputCamera.so
No symbol table info available.
#7  0x00007f9cbfb4758f in std::_Function_handler<void (), NUClear::PowerPlant::submit(std::unique_ptr<NUClear::threading::Task<NUClear::threading::Reaction>, std::default_delete<NUClear::threading::Task<NUClear::threading::Reaction> > >&&, bool const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#8  0x00007f9cbfb4c69c in NUClear::threading::TaskScheduler::run_task(NUClear::threading::TaskScheduler::Task&&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#9  0x00007f9cbfb4eefb in NUClear::threading::TaskScheduler::pool_func(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#10 0x00007f9cbfb4f4f5 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (NUClear::threading::TaskScheduler::*)(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>), NUClear::threading::TaskScheduler*, std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue> > > >::_M_run() () from lib/libsupportSignalCatcher.so
No symbol table info available.
#11 0x00007f9cb7ed6183 in std::execute_native_thread_routine (__p=0x55e1fce3d7c0) at /usr/src/debug/gcc/libstdc++-v3/src/c++11/thread.cc:82
        __t = <optimized out>
#12 0x00007f9cb7a8c54d in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#13 0x00007f9cb7b11874 in clone () from /usr/lib/libc.so.6
No symbol table info available.

Thread 4 (Thread 0x7f9c859fe640 (LWP 3933) "data_recording"):
#0  0x00007f9cb7b0b30d in syscall () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x00007f9cbde42c4c in ?? () from /usr/local/lib/libglib-2.0.so.0
No symbol table info available.
#2  0x00007f9cbe0dca30 in arv_shutdown () from /usr/local/lib/libaravis-0.8.so.0
No symbol table info available.
#3  0x00007f9cbf27c46e in std::_Function_handler<void (NUClear::threading::Task<NUClear::threading::Reaction>&), NUClear::util::CallbackGenerator<NUClear::dsl::Parse<NUClear::dsl::word::Shutdown>, module::input::Camera::Camera(std::unique_ptr<NUClear::Environment, std::default_delete<NUClear::Environment> >)::{lambda()#4}>::operator()(NUClear::threading::Reaction&)::{lambda(NUClear::threading::Task<NUClear::threading::Reaction>&)#1}>::_M_invoke(std::_Any_data const&, NUClear::threading::Task<NUClear::threading::Reaction>&) () from lib/libinputCamera.so
No symbol table info available.
#4  0x00007f9cbfb4758f in std::_Function_handler<void (), NUClear::PowerPlant::submit(std::unique_ptr<NUClear::threading::Task<NUClear::threading::Reaction>, std::default_delete<NUClear::threading::Task<NUClear::threading::Reaction> > >&&, bool const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#5  0x00007f9cbfb4c69c in NUClear::threading::TaskScheduler::run_task(NUClear::threading::TaskScheduler::Task&&) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#6  0x00007f9cbfb4eefb in NUClear::threading::TaskScheduler::pool_func(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>) () from lib/libsupportSignalCatcher.so
No symbol table info available.
#7  0x00007f9cbfb4f4f5 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (NUClear::threading::TaskScheduler::*)(std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue>), NUClear::threading::TaskScheduler*, std::shared_ptr<NUClear::threading::TaskScheduler::PoolQueue> > > >::_M_run() () from lib/libsupportSignalCatcher.so
No symbol table info available.
#8  0x00007f9cb7ed6183 in std::execute_native_thread_routine (__p=0x55e1fc980e10) at /usr/src/debug/gcc/libstdc++-v3/src/c++11/thread.cc:82
        __t = <optimized out>
#9  0x00007f9cb7a8c54d in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#10 0x00007f9cb7b11874 in clone () from /usr/lib/libc.so.6
No symbol table info available.
EmmanuelP commented 2 weeks ago

It looks like all the threads are waiting trying to lock arv_system_mutex, but I'm not sure. Could try to capture the backtrace with the debug symbols enabled. The backtrace should report the source line calling each function.