RobotWebTools / rclnodejs

Node.js version of ROS 2.0 client
https://docs.ros.org/en/humble/Concepts/Basic/About-Client-Libraries.html?highlight=rclnodejs#community-maintained
Apache License 2.0
319 stars 70 forks source link

Improve the docs & shutdown behavior when the node process is terminated #721

Closed felixdivo closed 3 years ago

felixdivo commented 3 years ago

This PR:

Some other stuff that I noticed and wanted to discuss. In rclnodejs/lib/context.js:

EDIT:

felixdivo commented 3 years ago

cc @minggangw

coveralls commented 3 years ago

Coverage Status

Coverage increased (+0.02%) to 91.207% when pulling 591d6b93ccd2444a3c8222a0da661d051f643641 on felixdivo:supress-error-at-shutdown into d41f7e91510cd55ff672dad8f7f0be095ed8c00c on RobotWebTools:develop.

minggangw commented 3 years ago

Sorry for the delay, some comments and will review the PR later, thanks!

Some other stuff that I noticed and wanted to discuss. In rclnodejs/lib/context.js:

1. Should we remove `Context.shutdownDefaultContext()`? I'd argue that we want to have a `Context.tryShutdownDefaultContext()` too, but that would start to weirdly duplicate stuff. Instead, I'd suggest the following:

   * Delete `Context.shutdownDefaultContext()`, which would now become the barely longer `Context.defaultContext().shutdown()`.

+1 Context.defaultContext().shutdown() is more consistent

   * By that, we get the "try" version for free, e.g. with `Context.defaultContext().tryShutdown()`.
   * Finally, we should add a check like `if (this === defaultContext) { defaultContext = undefined; }` to `Context.shutdown()`/`Context.tryShutdown()` anyways, as we would miss the cleanup if a user already explicitly chose to shut down the default context. This would, however, make `Context.shutdownDefaultContext()` even more redundant.

Would you please write a code snippet to show your idea? and we could discuss based on the PR.

2. Shouldn't `defaultContext` be set to `null` instead of `undefined` to explicitly signal that it is absent and not just a typo somewhere?

Yes, 'null' is reasonable.

3. I would expect the code `rclnodejs.shutdown()` to shut down _all_ existing contexts instead of only the default one. Is that intentional? Is there a will to change that?

We may create multi context and only shutdown one of them at a time, so we cannot close all contexts. Please see the corresponding test case: https://github.com/RobotWebTools/rclnodejs/blob/d41f7e91510cd55ff672dad8f7f0be095ed8c00c/test/test-init-shutdown.js#L164-L169

4. Shouldn't we call `NodeOptions.defaultOptions` in [`index.js`](https://github.com/RobotWebTools/rclnodejs/blob/develop/index.js#L187) instead of reference that method? I.e. append `()`.

Would you please explain more? which one do you prefer?

koonpeng commented 3 years ago

sorry, my mistake. Removing the installed handler does indeed cause node to not exit. I think node should exit when it runs out of work in the event loop, I tested this by setting a timeout and installing a signal handler to clear it (does not call process.exit) and node exited normally. But when i try it with rclnodejs, I shutdown the context and destroy all nodes I created in the handler but node doesn't stop. I don't know if my understanding of node is wrong or if rclnodejs is leaking some things in the event loop, I will have to research into it more.

minggangw commented 3 years ago

I don't know if my understanding of node is wrong or if rclnodejs is leaking some things in the event loop,

I suspect the sub-thread hasn't stopped (the default event loop is still running), so the main thread cannot quit. https://github.com/RobotWebTools/rclnodejs/blob/d41f7e91510cd55ff672dad8f7f0be095ed8c00c/src/executor.cpp#L97-L98

koonpeng commented 3 years ago

Here is what I found in my investigation.

rcl.init creates a guard condition https://github.com/RobotWebTools/rclnodejs/blob/d41f7e91510cd55ff672dad8f7f0be095ed8c00c/src/rcl_bindings.cpp#L116

The only place I see this guard conditional used is here https://github.com/RobotWebTools/rclnodejs/blob/d41f7e91510cd55ff672dad8f7f0be095ed8c00c/src/executor.cpp#L201-L217 There is a weird check of the wait set if there is only 1 guard condition and it is truthy, if so it sets running_ to be false, presumbly this is the g_sigint_gc guard conditional.

At the executor cleanup, it only runs if running_is true. https://github.com/RobotWebTools/rclnodejs/blob/d41f7e91510cd55ff672dad8f7f0be095ed8c00c/src/executor.cpp#L94

So my guess is that when SIGINT is received, the guard condition gets triggered, causing the if check to pass and set running_ to false, this happens in the C++ signal handler. Then node's SIGINT handler gets called, which does a rclnodejs.shutdown, but since running_ is false, no shutdown actually occurs.

I tried commenting out the line and that seems to fix it, ctrl+c without process.exit in the handler properly exits node.

I don't quite understand the intention of this check, there are also several other things that seems weird to me.

https://github.com/RobotWebTools/rclnodejs/blob/d41f7e91510cd55ff672dad8f7f0be095ed8c00c/src/executor.cpp#L74-L75 here the wait set is initialized with capacity for 2 guard conditions for some reason, I guess 1 is for g_sigint_gc but I don't find where is the 2nd slot for.

https://github.com/RobotWebTools/rclnodejs/blob/d41f7e91510cd55ff672dad8f7f0be095ed8c00c/src/executor.cpp#L174-L187 here the guard condition is resized, I don't know which handlers handle_manager is tracking but it returns 0 in my testing, so the number of guard conditions get resized to 1.

https://github.com/RobotWebTools/rclnodejs/blob/d41f7e91510cd55ff672dad8f7f0be095ed8c00c/src/executor.cpp#L201-L202 Also back here again, a new guard condition is added, this happens each time in the "run loop", this guard condition is never removed, I suppose it gets discarded from the resize. Is the goal to have the g_sigint_gc at the end of the array? But I don't know if it makes any difference?

koonpeng commented 3 years ago

Also do we really need a guard condition for SIGINT? I tried removing it and things still seems to work.

felixdivo commented 3 years ago

I addressed the comments above and collected the state of this PR in the description at the top. I did change the interface of Context a bit, so that's something to look at maybe.

I'm somewhat overwhelmed by the discussion on the exit, as I'm not a node.js native binding guru. ;) Great discussion, but guess I'll just listen to your conclusion and implement the result if you want.

minggangw commented 3 years ago

@koonpeng thanks for your comprehensive investigation, indeed, the guard condition related codes seem unreasonable, I think one possible reason is that the guard condition was not implemented at the beginning and some value was hard code (as the code snippet you show), but the original code was not cleaned up when the new feature was added. So, it's a good opportunity to do it right, would you please update #720 to fix what you have found, thanks a lot!

felixdivo commented 3 years ago

So it seems like this PR is ready to be merged. Is there some deprecation policy (I couldn't find one) or should we just add an entry to the release notes (I'd prefer that since it should be in a major release anyhow)?

minggangw commented 3 years ago

All changes info will be added to the release note when a new version is being published.

felixdivo commented 3 years ago

@minggangw The CI checks now pass. I improved one test to be less duplicated and such that it also checks some properties of the Context class.

minggangw commented 3 years ago

@koonpeng any comments on this PR? If not, I am going to merge it :)