AutonomyLab / create_robot

ROS driver for iRobot Create 1 and 2.
http://www.autonomylab.org/create_autonomy
BSD 3-Clause "New" or "Revised" License
198 stars 165 forks source link

Node shutdown error #87

Closed cbcoutu closed 3 years ago

cbcoutu commented 3 years ago

Hello,

I've noticed that the create_driver node isn't being destructed properly. Unfortunately I don't know a better name for the issue, I'm sorry if this has been brought up elsewhere and I didn't notice.

OS: Ubuntu 20.04 ROS: ROS2-Foxy create_robot branch: foxy libcreate branch: main

Re-create issue: Follow installation steps from the start, and launch create_2.launch.

Everything starts-up great, allowing full control of my Create2. However, upon sending the SIGINT command (ie Ctrl-c) the robot_state_publisher is shut down correctly, but the create_driver node is flagged with the error copied below (I've truncated my directory path):

[ERROR] [create_driver-1]: process has died [pid 3738, exit code 2, cmd '../create_ws/install/create_driver/lib/create_driver/create_driver --ros-args -r __node:=create_driver --params-file ../create_ws/install/create_bringup/share/create_bringup/config/default.yaml --params-file /tmp/launch_params_jm4axiif'].

I initially ignored this, but I've noticed ROS2 starts to behave strangely when relaunching the nodes several times over, (primarily ros2 node list no longer displays running nodes even though the rqt_graph shows everything correctly -- solved only by rebooting my computer.)

Doing some homework, I've made a couple preliminary observations that may help re-iterate the problem: 1) rclcpp::shutdown() in main() is never called. Confirm this by printing some statements around rclcpp::spin. When ctrl-c is sent, any-statement following the spin are never printed to the terminal. 2) When ctrl-c is sent, the Serial::signalHandler from libcreate is executed (confirm by throwing some print statements before the exit() call). If I understand correctly, this signalhandler is over-riding the signalhandler provided by ROS and is executed before the nodes have a chance to close. The exit call is what generates the printed error code included above.

I've been reading up on how the signal handler works, and found some interesting discussions/resources, however I've yet to solve (or fully isolate) the problem: 1) 317 --unfortunately this thread ends without a clear solution-- 2) roscpp --Based on the roscpp documentation, I think the signalhandler needs to be declared after the nodes have already been initialized, which I plan to try out.--

(Note, out of curiosity I tried to include in Serial::signalHandler a call to 'rclcpp::shutdown()' before the exit() statement, but this changes nothing for me.)

Is someone able to please confirm this problem exists? I admit there may be something trivial I'm missing.

Thanks for your time and offering such a library! Aside from this hiccup it's been fun to use.

cbcoutu commented 3 years ago

Apparently, I just needed to voice my problem to have the solution work, (guess I should invest in a rubber duck.) I will leave this issue open for a little longer, as the resulting solution is incredibly hacky and may have underlying problems.

Simply creating yet another signalhanlder over-ride (as mentioned in roscpp and 317 linked above) closes everything correctly. I'm not very content on this solution, because I tried this the other day and it didn't change anything. Not sure why it worked this time around, but the edits I made are:

Starting around line 655 in create_driver.cpp

void mySigintHandler(int sig){  //ADDITION
  rclcpp::shutdown();                //ADDITION
}                                               //ADDITION

int main(int argc, char** argv)
{
  rclcpp::init(argc, argv);
  auto create_driver = std::make_shared<CreateDriver>();
  signal(SIGINT, mySigintHandler); //ADDITION
  try
  {
    rclcpp::spin(create_driver);
    std::cout <<"PRINT" <<std::endl; //NOW THIS CORRECTLY PRINTS!
  }
  catch (std::runtime_error& ex)
  {
    RCLCPP_FATAL_STREAM(create_driver->get_logger(), "[CREATE] Runtime error: " << ex.what());
    return 1;
  }

  rclcpp::shutdown();
  return 0;
}

Again, a second opinion is greatly appreciated on this so I'll leave the issue for at least a couple days.

Thanks!

jacobperron commented 3 years ago

Thanks for opening the issue! I suspect that the signal handler registered by libcreate is interferring with the rclcpp signal handler. I'll see if I can reproduce later this evening.

jacobperron commented 3 years ago

@cbcoutu I was able to reproduce the issue (thanks for the detailed investigation!). I think #89 should fix this issue, which depends on a new option in libcreate for disabling it's signal handler (https://github.com/AutonomyLab/libcreate/pull/65). If you are able to try it out and report back that would be awesome :slightly_smiling_face: