cyberbotics / webots_ros2

Webots ROS 2 packages
Apache License 2.0
395 stars 147 forks source link

SIGINT handling #476

Open PhilippPolterauer opened 1 year ago

PhilippPolterauer commented 1 year ago

Bug Description when trying to stop a running launch file via CTRL+C the ros driver does not terminate cleanly. We encountered this issue when writing a C++ plugin which utilizes RAII and found out that objects which are created inside the C++ plugin are not properly destructed.

Steps to Reproduce

  1. Install recent version
    1. mkdir -p ~/dev_ws/src/; cd ~/dev_ws/src; git clone --recursive https://github.com/cyberbotics/webots_ros2.git; cd ..
    2. colcon build --symlink-install
  2. Launch a webots simulation (e.g. tesla) source install/local_setup.bash; ros2 launch webots_ros2_tesla robot_launch.py
  3. press Ctrl-C
  4. observe error below
    philpolt@vanlinnb12001:~/ros/webots_shutdown_issue$ ros2 launch webots_ros2_tesla robot_launch.py 
    [INFO] [launch]: All log files can be found below /home/philpolt/.ros/log/2022-09-26-10-19-37-228324-vanlinnb12001-413857
    [INFO] [launch]: Default logging verbosity is set to INFO
    [INFO] [webots-1]: process started with pid [413860]
    [INFO] [ros2_supervisor.py-2]: process started with pid [413862]
    [INFO] [lane_follower-3]: process started with pid [413864]
    [INFO] [driver-4]: process started with pid [413868]
    [driver-4] Cannot connect to Webots instance on socket "/tmp/webots-255926/ipc/vehicle/extern", retrying in 0 second...
    [ros2_supervisor.py-2] Cannot connect to Webots instance on socket "/tmp/webots-1234/ipc/Ros2Supervisor/extern", retrying in 0 second...
    [driver-4] Cannot connect to Webots instance on socket "/tmp/webots-1234/ipc/vehicle/extern", retrying in 1 second...
    [ros2_supervisor.py-2] Cannot connect to Webots instance on socket "/tmp/webots-1234/ipc/Ros2Supervisor/extern", retrying in 1 second...
    [driver-4] Cannot connect to Webots instance on socket "/tmp/webots-1234/ipc/vehicle/extern", retrying in 2 seconds...
    [ros2_supervisor.py-2] Cannot connect to Webots instance on socket "/tmp/webots-1234/ipc/Ros2Supervisor/extern", retrying in 2 seconds...
    [driver-4] Cannot connect to Webots instance on socket "/tmp/webots-1234/ipc/vehicle/extern", retrying in 3 seconds...
    [ros2_supervisor.py-2] Cannot connect to Webots instance on socket "/tmp/webots-1234/ipc/Ros2Supervisor/extern", retrying in 3 seconds...
    ^C[WARNING] [launch]: user interrupted with ctrl-c (SIGINT)
    [lane_follower-3] Traceback (most recent call last):
    [lane_follower-3]   File "/home/philpolt/ros/webots_shutdown_issue/install/webots_ros2_tesla/lib/webots_ros2_tesla/lane_follower", line 11, in <module>
    [lane_follower-3]     load_entry_point('webots-ros2-tesla', 'console_scripts', 'lane_follower')()
    [lane_follower-3]   File "/home/philpolt/ros/webots_shutdown_issue/build/webots_ros2_tesla/webots_ros2_tesla/lane_follower.py", line 76, in main
    [lane_follower-3]     rclpy.spin(follower)
    [lane_follower-3]   File "/opt/ros/galactic/lib/python3.8/site-packages/rclpy/__init__.py", line 196, in spin
    [lane_follower-3]     executor.spin_once()
    [lane_follower-3]   File "/opt/ros/galactic/lib/python3.8/site-packages/rclpy/executors.py", line 704, in spin_once
    [lane_follower-3]     handler, entity, node = self.wait_for_ready_callbacks(timeout_sec=timeout_sec)
    [lane_follower-3]   File "/opt/ros/galactic/lib/python3.8/site-packages/rclpy/executors.py", line 690, in wait_for_ready_callbacks
    [lane_follower-3]     return next(self._cb_iter)
    [lane_follower-3]   File "/opt/ros/galactic/lib/python3.8/site-packages/rclpy/executors.py", line 588, in _wait_for_ready_callbacks
    [lane_follower-3]     wait_set.wait(timeout_nsec)
    [lane_follower-3] KeyboardInterrupt
    [ERROR] [ros2_supervisor.py-2]: process has died [pid 413862, exit code -2, cmd '/home/philpolt/ros/webots_shutdown_issue/install/webots_ros2_driver/lib/webots_ros2_driver/ros2_supervisor.py --ros-args'].
    [INFO] [driver-4]: process has finished cleanly [pid 413868]
    [INFO] [webots-1]: process has finished cleanly [pid 413860]
    [ERROR] [lane_follower-3]: process has died [pid 413864, exit code -2, cmd '/home/philpolt/ros/webots_shutdown_issue/install/webots_ros2_tesla/lib/webots_ros2_tesla/lane_follower --ros-args'].

    Error: [ERROR] [ros2_supervisor.py-2]: process has died [pid 413862, exit code -2, cmd '/home/philpolt/ros/webots_shutdown_issue/install/webots_ros2_driver/lib/webots_ros2_driver/ros2_supervisor.py --ros-args'].

Expected behavior The signal SIGINT should be properly handled inside the driver node. The launch file process ros2_supervisor.py-2 should finish cleanly!

System

angel-ayala commented 1 year ago

This seems to be an issue with Ros2Supervisor main loop. Apparently, python plugins are forced to be stopped without proper exit handling. Similarly in #655 , when the Ros2Supervisor.__supervisor_step_callback enter the condition of wb_robot_step() == -1 here, the only return triggered to end the execution happens on WebotsNode C++ side only, leaving the Python side without proper stop and cleaning procedure. In the side of ROS2 client, at first sight, can be a resource leak when the simulation is resetted because Nodes associated results are never removed and always reinstanciated at the Python side.

Any thoughts, @ygoumaz ? I was tempting a workaround with a topic publish from Ros2Supervisor to notify the wb_robot_step() == -1 condition without success. Then I add the wb_robot_step() == -1 verification in the Python plugins, calling sys.exit(0) and only works inside step() finishing the driver process cleanly.

Other thing to try maybe can be add a shutdown() method to the plugins. At the ROS2 side also have, lifecycle which can also be a good option to create a management for node states.

angel-ayala commented 1 year ago

Adding the shutdown() methods seems feasible.

In PythonPlugin.cpp

void PythonPlugin::stop() {
    PyObject_CallMethod(mPyPlugin, "stop", "");
    std::cout << "PythonPlugin::stop" << std::endl;
//     PyErr_Print();
    std::cout << "PythonPlugin::stop::print" << std::endl;
    Py_Finalize();
    std::cout << "PythonPlugin::stop::finalize" << std::endl;
  }

the Python method

def stop(self):
    self.__node.get_logger().info('stopping')
    self.__node.destroy_node()
    sys.exit(0)

In WebotsNode.cpp

int WebotsNode::step() {
    if (gShutdownSignalReceived && !mWaitingForUrdfRobotToBeRemoved) {
      mRemoveUrdfRobotPublisher->publish(mRemoveUrdfRobotMessage);
      mWaitingForUrdfRobotToBeRemoved = true;
    }

    const int result = wb_robot_step(mStep);
    if (result == -1) {    
      // Check if the plugin is actually an instance of PythonPlugin
      for (std::shared_ptr<PluginInterface> plugin : mPlugins) {
        std::shared_ptr<webots_ros2_driver::PythonPlugin> pythonPlugin = std::dynamic_pointer_cast<webots_ros2_driver::PythonPlugin>(plugin);
        if(pythonPlugin) {
          std::cout << "stopssss" << std::endl;
          pythonPlugin->stop();  // stop only for python plugins
        }
      }
      return result;

    }
    for (std::shared_ptr<PluginInterface> plugin : mPlugins)
      plugin->step();

    return result;
  }

Using this code in Driver.cpp

std::shared_ptr<webots_ros2_driver::WebotsNode> node = std::make_shared<webots_ros2_driver::WebotsNode>(robotName);
  node->init();
  RCLCPP_INFO(node->get_logger(), "Controller successfully connected to robot in Webots simulation.");
  while (true) {
    if (node->step() == -1)
      break;
    rclcpp::spin_some(node);
  }
  std::cout << "outlook" << std::endl;
//   rclcpp::spin_some(node);
//   std::cout << "last spin" << std::endl;
  RCLCPP_INFO(node->get_logger(), "Controller successfully disconnected from robot in Webots simulation.");
  std::cout << "shutting down" << std::endl;
  rclcpp::shutdown();
  std::cout << "clean" << std::endl;
  wb_robot_cleanup();
  std::cout << "return" << std::endl;
  return 0;

However, the print trace reach until std::cout << "clean" << std::endl; before the process exit with code 1, with both Python plugins. Apparently there is an unhandled event when wb_robot_cleanup(); is called.

UPDATED Apparently, seems to be a twice call of the wb_robot_cleanup() function, called once at Robot.__del__ Python method, and at the end of Driver.cpp

ygoumaz commented 1 year ago

Regarding issue #655, I will implement a shutdown handler in the launch file that will clean all other started nodes, excluding Webots, and then respawn them. However, before proceeding with this implementation in Iron, I am awaiting the merge of #760.

Regarding this issue, I believe your approach is correct. While I haven't examined it in depth yet, you could potentially invoke wb_robot_cleanup in the driver only if the plugin is not a Python plugin. Otherwise, we can adjust the Python API similarly to what was done for the constructor and wb_robot_init, thus avoiding a redundant call in Robot.del.

angel-ayala commented 1 year ago

I will test the wb_robot_cleanup condition, initial idea is itering again the WebotsNode.plugin elements to correctly choose a shutdown method.

In this regard, the Python API seems to require a more in depth examination. I developed a Gym environment using webots_ros2 and encountered that the info gathered at the environment side is delayed. Initial findings show the lack of spin_some() method at the side of rlcpy (the mantainers already know this), since the C++ ROS nodes appears to be faster than Python ROS nodes in using the DDS layer due at each step the driver call spin_some() in the C++ side and only spin_once(timeout_sec=0) in the Python side, this last requiring more time to finish all the ROS callbacks in one step.

ygoumaz commented 1 year ago

Still related to #788.