SICKAG / sick_safetyscanners2

ROS2 driver for SICK safety laser scanners
https://www.sick.com/de/en/opto-electronic-protective-devices/safety-laser-scanners/c/g187225
Apache License 2.0
28 stars 30 forks source link

race condition in `Colla2Session` among `m_pending_commands_map` #22

Closed kou-kikutake closed 1 year ago

kou-kikutake commented 1 year ago

sick_safetyscanners ros package version: 1.0.8

When field_data service is called while communication with LiDAR sensor is established, we observed the case that the ros node caused segfault.

Here are stacktraces.

Thread 1 (handling field_data service call)

Thread 1 (Thread 0x7effce5eeec0 (LWP 297)):
#0  0x00007effd0e304aa in std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007effd12a4cbc in std::_Rb_tree<unsigned short, std::pair<unsigned short const, std::shared_ptr<sick::cola2::Command> >, std::_Select1st<std::pair<unsigned short const, std::shared_ptr<sick::cola2::Command> > >, std::less<unsigned short>, std::allocator<std::pair<unsigned short const, std::shared_ptr<sick::cola2::Command> > > >::_M_insert_node (__z=0x556ff7f42180, __p=<optimized out>, __x=0x0, this=0x556ff7f3d690) at /usr/include/c++/9/bits/stl_tree.h:2359
#2  std::_Rb_tree<unsigned short, std::pair<unsigned short const, std::shared_ptr<sick::cola2::Command> >, std::_Select1st<std::pair<unsigned short const, std::shared_ptr<sick::cola2::Command> > >, std::less<unsigned short>, std::allocator<std::pair<unsigned short const, std::shared_ptr<sick::cola2::Command> > > >::_M_emplace_hint_unique<std::piecewise_construct_t const&, std::tuple<unsigned short const&>, std::tuple<> >(std::_Rb_tree_const_iterator<std::pair<unsigned short const, std::shared_ptr<sick::cola2::Command> > >, std::piecewise_construct_t const&, std::tuple<unsigned short const&>&&, std::tuple<>&&) (this=0x556ff7f3d690, __pos=..., __pos@entry=Python Exception <class 'AttributeError'> 'NoneType' object has no attribute 'pointer': 
...) at /usr/include/c++/9/bits/stl_tree.h:2467
#3  0x00007effd12a3ea5 in std::map<unsigned short, std::shared_ptr<sick::cola2::Command>, std::less<unsigned short>, std::allocator<std::pair<unsigned short const, std::shared_ptr<sick::cola2::Command> > > >::operator[] (__k=<optimized out>, this=<optimized out>) at /usr/include/c++/9/bits/stl_tree.h:348
#4  sick::cola2::Cola2Session::addCommand (command=..., request_id=<optimized out>, this=<optimized out>) at ./src/cola2/Cola2Session.cpp:147
#5  sick::cola2::Cola2Session::addCommand (this=<optimized out>, request_id=<optimized out>, command=std::shared_ptr<class sick::cola2::Command> (use count 1, weak count 0) = {...}) at ./src/cola2/Cola2Session.cpp:141
#6  0x00007effd12a3f4e in sick::cola2::Cola2Session::executeCommand (this=0x556ff7f3d650, command=std::shared_ptr<class sick::cola2::Command> (use count 1, weak count 0) = {...}) at /usr/include/c++/9/bits/shared_ptr_base.h:1020
#7  0x00007effd1295299 in sick::SickSafetyscanners::requestFieldDataInColaSession (this=0x556ff7f3d0b0, fields=std::vector of length 72, capacity 128 = {...}) at /usr/include/c++/9/bits/shared_ptr_base.h:1020
#8  0x00007effd1295551 in sick::SickSafetyscanners::requestFieldData (this=0x556ff7f3d0b0, settings=..., field_data=std::vector of length 72, capacity 128 = {...}) at ./src/SickSafetyscanners.cpp:124

Thread 4 (Handling response from LiDAR)

Thread 4 (Thread 0x7effbffff700 (LWP 425)):
#0  0x00007effd0e30a28 in std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007effd12a3870 in std::_Rb_tree<unsigned short, std::pair<unsigned short const, std::shared_ptr<sick::cola2::Command> >, std::_Select1st<std::pair<unsigned short const, std::shared_ptr<sick::cola2::Command> > >, std::less<unsigned short>, std::allocator<std::pair<unsigned short const, std::shared_ptr<sick::cola2::Command> > > >::_M_erase_aux (__position=..., this=0x556ff7f3d690) at /usr/include/c++/9/bits/stl_tree.h:2509
#2  std::_Rb_tree<unsigned short, std::pair<unsigned short const, std::shared_ptr<sick::cola2::Command> >, std::_Select1st<std::pair<unsigned short const, std::shared_ptr<sick::cola2::Command> > >, std::less<unsigned short>, std::allocator<std::pair<unsigned short const, std::shared_ptr<sick::cola2::Command> > > >::erase[abi:cxx11](std::_Rb_tree_iterator<std::pair<unsigned short const, std::shared_ptr<sick::cola2::Command> > >) (__position=..., this=0x556ff7f3d690) at /usr/include/c++/9/bits/stl_tree.h:1225
#3  std::map<unsigned short, std::shared_ptr<sick::cola2::Command>, std::less<unsigned short>, std::allocator<std::pair<unsigned short const, std::shared_ptr<sick::cola2::Command> > > >::erase[abi:cxx11](std::_Rb_tree_iterator<std::pair<unsigned short const, std::shared_ptr<sick::cola2::Command> > >) (__position=..., this=0x556ff7f3d690) at /usr/include/c++/9/bits/stl_map.h:1037
#4  sick::cola2::Cola2Session::removeCommand (this=this@entry=0x556ff7f3d650, request_id=@0x7effbfffea2e: 147) at ./src/cola2/Cola2Session.cpp:168
#5  0x00007effd12a4503 in sick::cola2::Cola2Session::startProcessingAndRemovePendingCommandAfterwards (this=0x556ff7f3d650, packet=...) at ./src/cola2/Cola2Session.cpp:136
#6  0x00007effd12a45ca in sick::cola2::Cola2Session::processPacket (this=0x556ff7f3d650, packet=...) at ./src/cola2/Cola2Session.cpp:104
#7  0x00007effd12b12e3 in boost::function1<void, sick::datastructure::PacketBuffer const&>::operator() (a0=..., this=0x556ff7f49b18) at /usr/include/boost/function/function_template.hpp:677
#8  sick::communication::AsyncTCPClient::handleReceive (this=0x556ff7f47400, error=..., bytes_transferred=<optimized out>) at ./src/communication/AsyncTCPClient.cpp:163
#9  0x00007effd12b242a in sick::communication::AsyncTCPClient::<lambda(boost::system::error_code, std::size_t)>::operator() (__closure=0x7effbfffebb0, bytes_recvd=<optimized out>, ec=...) at ./src/communication/AsyncTCPClient.cpp:131

In this case, sick::cola2::Cola2Session::addCommand and sick::cola2::Cola2Session::removeCommand were being called simultaneously. Both member functions are modifying m_pending_commands_map without proper lock / critical section.

I haven't tried reproduce the issue with latest master, but based on source code, I cannot find any fix.

kou-kikutake commented 1 year ago

I have posted to wrong repo. moved to https://github.com/SICKAG/sick_safetyscanners/issues/114.