SteveMacenski / slam_toolbox

Slam Toolbox for lifelong mapping and localization in potentially massive maps with ROS
GNU Lesser General Public License v2.1
1.5k stars 497 forks source link

Sync/Async Node Crashes With Lifecycle Transition #671

Closed lekoook closed 4 months ago

lekoook commented 4 months ago

Required Info:

Steps to reproduce issue

  1. Launch sync/async node with online_sync_launch.py or online_async_launch.py launch files. Make sure the launch argument autostart is set to false.
  2. Execute ros2 lifecycle set /slam_toolbox 1 to configure the node.
  3. Execute ros2 lifecycle set /slam_toolbox 3 to activate the node.
  4. Execute ros2 lifecycle set /slam_toolbox 4 to deactivate the node.
  5. Execute ros2 lifecycle set /slam_toolbox 3 to activate the node again.
  6. Node crashes.

Expected behavior

The node should transit to the Active state successfully.

Actual behavior

The node crashes instead of a successful transition.

Additional information

Update: Just tested with localization_launch.py and offline_launch.py launch files, the same occurs with the same reproducible steps.

SteveMacenski commented 4 months ago

This was changed recently in https://github.com/SteveMacenski/slam_toolbox/pull/561. @kei1107 can you please take a look at these errors? It looks like you may have missed some things in your refactor.

Note that you're not on the humble branch, because the lifecycle node aren't in the humble branch, they're only on main. What part of that cycle crashes, precisely? A back trace is a tool you can use to find that, we have a tutorial: https://navigation.ros.org/tutorials/docs/get_backtrace.html

kei1107 commented 4 months ago

@SteveMacenski I checked immediately and it crashed as well ( Ubuntu 22.04 Docker and turtlebot3_gazebo ). However, it does not occur on commit [97796700adbd7d931f15ca25ef41d60068037c51] , so #669 seems to be affecting it.

Commit : [5507c53d6272db0aba266e7af0262414b6a6d733]

$ ros2 launch slam_toolbox localization_launch.py autostart:=false
[INFO] [launch]: All log files can be found below /home/ubuntu/.ros/log/2024-02-20-22-59-09-290288-252a2b63da0f-5171
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [localization_slam_toolbox_node-1]: process started with pid [5183]
[localization_slam_toolbox_node-1] [INFO] [1708469949.464146326] [slam_toolbox]: Node using stack size 40000000
[localization_slam_toolbox_node-1] [INFO] [1708469956.733293403] [slam_toolbox]: Configuring
[localization_slam_toolbox_node-1] [INFO] [1708469956.739235619] [slam_toolbox]: Using solver plugin solver_plugins::CeresSolver
[localization_slam_toolbox_node-1] [INFO] [1708469956.739441216] [slam_toolbox]: CeresSolver: Using SCHUR_JACOBI preconditioner.
[localization_slam_toolbox_node-1] [INFO] [1708469960.034249802] [slam_toolbox]: Activating
[localization_slam_toolbox_node-1] Info: clipped range threshold to be within minimum and maximum range!
[localization_slam_toolbox_node-1] [WARN] [1708469960.074409935] [slam_toolbox]: maximum laser range setting (20.0 m) exceeds the capabilities of the used Lidar (3.5 m)
[localization_slam_toolbox_node-1] Registering sensor: [Custom Described Lidar]
[localization_slam_toolbox_node-1] [INFO] [1708469963.330636983] [slam_toolbox]: Deactivating
[localization_slam_toolbox_node-1] [INFO] [1708469974.867029959] [slam_toolbox]: Activating
[ERROR] [localization_slam_toolbox_node-1]: process has died [pid 5183, exit code -11, cmd '/home/ubuntu/dev_ws/install/slam_toolbox/lib/slam_toolbox/localization_slam_toolbox_node --ros-args -r __node:=slam_toolbox -r __ns:=/ --params-file /home/ubuntu/dev_ws/install/slam_toolbox/share/slam_toolbox/config/mapper_params_localization.yaml --params-file /tmp/launch_params_097onfp_'].

Commit : [97796700adbd7d931f15ca25ef41d60068037c51]

$ ros2 launch slam_toolbox localization_launch.py autostart:=false
[INFO] [launch]: All log files can be found below /home/ubuntu/.ros/log/2024-02-20-23-02-20-452521-252a2b63da0f-6799
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [localization_slam_toolbox_node-1]: process started with pid [6811]
[localization_slam_toolbox_node-1] [INFO] [1708470140.729797303] [slam_toolbox]: Node using stack size 40000000
[localization_slam_toolbox_node-1] [INFO] [1708470147.906198440] [slam_toolbox]: Configuring
[localization_slam_toolbox_node-1] [INFO] [1708470147.911437911] [slam_toolbox]: Using solver plugin solver_plugins::CeresSolver
[localization_slam_toolbox_node-1] [INFO] [1708470147.911636902] [slam_toolbox]: CeresSolver: Using SCHUR_JACOBI preconditioner.
[localization_slam_toolbox_node-1] [INFO] [1708470151.966335127] [slam_toolbox]: Activating
[localization_slam_toolbox_node-1] Info: clipped range threshold to be within minimum and maximum range!
[localization_slam_toolbox_node-1] [WARN] [1708470151.999937407] [slam_toolbox]: maximum laser range setting (20.0 m) exceeds the capabilities of the used Lidar (3.5 m)
[localization_slam_toolbox_node-1] Registering sensor: [Custom Described Lidar]
[localization_slam_toolbox_node-1] [INFO] [1708470157.960871264] [slam_toolbox]: Deactivating
[localization_slam_toolbox_node-1] [INFO] [1708470167.071551708] [slam_toolbox]: Activating
# Transitioning successful

Thanks for the info about the backtrace. I'll look into it after this.

kei1107 commented 4 months ago
(gdb) backtrace
#0  0x00007ffff7dfd70c in loop_closure_assistant::LoopClosureAssistant::publishGraph (this=0x0)
    at /home/ubuntu/dev_ws/src/slam_toolbox/src/loop_closure_assistant.cpp:160
#1  0x00007ffff7d4a571 in slam_toolbox::SlamToolbox::publishVisualizations (
    this=0x555555651bd0) at /usr/include/c++/11/bits/unique_ptr.h:173
#2  0x00007ffff6da90cb in ?? ()
   from /lib/x86_64-linux-gnu/libboost_thread.so.1.74.0
#3  0x00007ffff72b9ac3 in start_thread (arg=<optimized out>)
    at ./nptl/pthread_create.c:442
#4  0x00007ffff734b850 in clone3 ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
kei1107 commented 4 months ago

In #669, closure_assistant is generated in on_configure, but the reset has not been changed.
(on_deactivate -> on_cleanup)

https://github.com/SteveMacenski/slam_toolbox/blob/5507c53d6272db0aba266e7af0262414b6a6d733/src/slam_toolbox_common.cpp#L174-L188


fixed

$ ros2 launch slam_toolbox localization_launch.py autostart:=false
[INFO] [launch]: All log files can be found below /home/ubuntu/.ros/log/2024-02-20-23-58-53-386970-252a2b63da0f-25486
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [localization_slam_toolbox_node-1]: process started with pid [25499]
[localization_slam_toolbox_node-1] [INFO] [1708473533.540587986] [slam_toolbox]: Node using stack size 40000000
[localization_slam_toolbox_node-1] [INFO] [1708473538.809562227] [slam_toolbox]: Configuring
[localization_slam_toolbox_node-1] [INFO] [1708473538.818868587] [slam_toolbox]: Using solver plugin solver_plugins::CeresSolver
[localization_slam_toolbox_node-1] [INFO] [1708473538.820987513] [slam_toolbox]: CeresSolver: Using SCHUR_JACOBI preconditioner.
[localization_slam_toolbox_node-1] [INFO] [1708473544.363418862] [slam_toolbox]: Activating
[localization_slam_toolbox_node-1] Info: clipped range threshold to be within minimum and maximum range!
[localization_slam_toolbox_node-1] [WARN] [1708473544.465040902] [slam_toolbox]: maximum laser range setting (20.0 m) exceeds the capabilities of the used Lidar (3.5 m)
[localization_slam_toolbox_node-1] Registering sensor: [Custom Described Lidar]
[localization_slam_toolbox_node-1] [INFO] [1708473548.333439825] [slam_toolbox]: Deactivating
[localization_slam_toolbox_node-1] [INFO] [1708473551.620347644] [slam_toolbox]: Activating
# Transitioning successful
$ git diff
diff --git a/src/slam_toolbox_common.cpp b/src/slam_toolbox_common.cpp
index 78f6643..44c7178 100644
--- a/src/slam_toolbox_common.cpp
+++ b/src/slam_toolbox_common.cpp
@@ -181,7 +181,6 @@ CallbackReturn SlamToolbox::on_deactivate(const rclcpp_lifecycle::State &)
     threads_[i].reset();
   }
   threads_.clear();
-  closure_assistant_.reset();
   sst_->on_deactivate();
   sstm_->on_deactivate();
   pose_pub_->on_deactivate();
@@ -210,6 +209,7 @@ CallbackReturn SlamToolbox::on_cleanup(const rclcpp_lifecycle::State &)
 /*****************************************************************************/
 {
   RCLCPP_INFO(get_logger(), "Cleaning up");
+  closure_assistant_.reset();
   smapper_.reset();
   dataset_.reset();
   map_saver_.reset();
SteveMacenski commented 4 months ago

Ah, thank you! Closing this and merging your PR. Thanks!

There's always a couple of things after big refactors. Thanks for the fix.