Closed felixdivo closed 3 years ago
cc @minggangw
Sorry for the delay, some comments and will review the PR later, thanks!
Some other stuff that I noticed and wanted to discuss. In
rclnodejs/lib/context.js
:1. Should we remove `Context.shutdownDefaultContext()`? I'd argue that we want to have a `Context.tryShutdownDefaultContext()` too, but that would start to weirdly duplicate stuff. Instead, I'd suggest the following: * Delete `Context.shutdownDefaultContext()`, which would now become the barely longer `Context.defaultContext().shutdown()`.
+1 Context.defaultContext().shutdown()
is more consistent
* By that, we get the "try" version for free, e.g. with `Context.defaultContext().tryShutdown()`. * Finally, we should add a check like `if (this === defaultContext) { defaultContext = undefined; }` to `Context.shutdown()`/`Context.tryShutdown()` anyways, as we would miss the cleanup if a user already explicitly chose to shut down the default context. This would, however, make `Context.shutdownDefaultContext()` even more redundant.
Would you please write a code snippet to show your idea? and we could discuss based on the PR.
2. Shouldn't `defaultContext` be set to `null` instead of `undefined` to explicitly signal that it is absent and not just a typo somewhere?
Yes, 'null' is reasonable.
3. I would expect the code `rclnodejs.shutdown()` to shut down _all_ existing contexts instead of only the default one. Is that intentional? Is there a will to change that?
We may create multi context and only shutdown one of them at a time, so we cannot close all contexts. Please see the corresponding test case: https://github.com/RobotWebTools/rclnodejs/blob/d41f7e91510cd55ff672dad8f7f0be095ed8c00c/test/test-init-shutdown.js#L164-L169
4. Shouldn't we call `NodeOptions.defaultOptions` in [`index.js`](https://github.com/RobotWebTools/rclnodejs/blob/develop/index.js#L187) instead of reference that method? I.e. append `()`.
Would you please explain more? which one do you prefer?
sorry, my mistake. Removing the installed handler does indeed cause node to not exit. I think node should exit when it runs out of work in the event loop, I tested this by setting a timeout and installing a signal handler to clear it (does not call process.exit) and node exited normally. But when i try it with rclnodejs, I shutdown the context and destroy all nodes I created in the handler but node doesn't stop. I don't know if my understanding of node is wrong or if rclnodejs is leaking some things in the event loop, I will have to research into it more.
I don't know if my understanding of node is wrong or if rclnodejs is leaking some things in the event loop,
I suspect the sub-thread hasn't stopped (the default event loop is still running), so the main thread cannot quit. https://github.com/RobotWebTools/rclnodejs/blob/d41f7e91510cd55ff672dad8f7f0be095ed8c00c/src/executor.cpp#L97-L98
Here is what I found in my investigation.
rcl.init
creates a guard condition
https://github.com/RobotWebTools/rclnodejs/blob/d41f7e91510cd55ff672dad8f7f0be095ed8c00c/src/rcl_bindings.cpp#L116
The only place I see this guard conditional used is here
https://github.com/RobotWebTools/rclnodejs/blob/d41f7e91510cd55ff672dad8f7f0be095ed8c00c/src/executor.cpp#L201-L217
There is a weird check of the wait set if there is only 1 guard condition and it is truthy, if so it sets running_
to be false, presumbly this is the g_sigint_gc
guard conditional.
At the executor cleanup, it only runs if running_
is true.
https://github.com/RobotWebTools/rclnodejs/blob/d41f7e91510cd55ff672dad8f7f0be095ed8c00c/src/executor.cpp#L94
So my guess is that when SIGINT is received, the guard condition gets triggered, causing the if check to pass and set running_
to false, this happens in the C++ signal handler. Then node's SIGINT handler gets called, which does a rclnodejs.shutdown
, but since running_
is false, no shutdown actually occurs.
I tried commenting out the line and that seems to fix it, ctrl+c without process.exit
in the handler properly exits node.
I don't quite understand the intention of this check, there are also several other things that seems weird to me.
https://github.com/RobotWebTools/rclnodejs/blob/d41f7e91510cd55ff672dad8f7f0be095ed8c00c/src/executor.cpp#L74-L75
here the wait set is initialized with capacity for 2 guard conditions for some reason, I guess 1 is for g_sigint_gc
but I don't find where is the 2nd slot for.
https://github.com/RobotWebTools/rclnodejs/blob/d41f7e91510cd55ff672dad8f7f0be095ed8c00c/src/executor.cpp#L174-L187
here the guard condition is resized, I don't know which handlers handle_manager
is tracking but it returns 0 in my testing, so the number of guard conditions get resized to 1.
https://github.com/RobotWebTools/rclnodejs/blob/d41f7e91510cd55ff672dad8f7f0be095ed8c00c/src/executor.cpp#L201-L202
Also back here again, a new guard condition is added, this happens each time in the "run loop", this guard condition is never removed, I suppose it gets discarded from the resize. Is the goal to have the g_sigint_gc
at the end of the array? But I don't know if it makes any difference?
Also do we really need a guard condition for SIGINT? I tried removing it and things still seems to work.
I addressed the comments above and collected the state of this PR in the description at the top. I did change the interface of Context
a bit, so that's something to look at maybe.
I'm somewhat overwhelmed by the discussion on the exit, as I'm not a node.js native binding guru. ;) Great discussion, but guess I'll just listen to your conclusion and implement the result if you want.
@koonpeng thanks for your comprehensive investigation, indeed, the guard condition related codes seem unreasonable, I think one possible reason is that the guard condition was not implemented at the beginning and some value was hard code (as the code snippet you show), but the original code was not cleaned up when the new feature was added. So, it's a good opportunity to do it right, would you please update #720 to fix what you have found, thanks a lot!
So it seems like this PR is ready to be merged. Is there some deprecation policy (I couldn't find one) or should we just add an entry to the release notes (I'd prefer that since it should be in a major release anyhow)?
All changes info will be added to the release note when a new version is being published.
@minggangw The CI checks now pass. I improved one test to be less duplicated and such that it also checks some properties of the Context
class.
@koonpeng any comments on this PR? If not, I am going to merge it :)
This PR:
rclnodejs/index.js
.Some other stuff that I noticed and wanted to discuss. In
rclnodejs/lib/context.js
:Context.shutdownDefaultContext()
? I'd argue that we want to have aContext.tryShutdownDefaultContext()
too, but that would start to weirdly duplicate stuff. Instead, I'd suggest the following:Context.shutdownDefaultContext()
, which would now become the barely longerContext.defaultContext().shutdown()
.Context.defaultContext().tryShutdown()
.if (this === defaultContext) { defaultContext = undefined; }
toContext.shutdown()
/Context.tryShutdown()
anyways, as we would miss the cleanup if a user already explicitly chose to shut down the default context. This would, however, makeContext.shutdownDefaultContext()
even more redundant._shutdown()
inrclnodejs/index.js
.defaultContext
be set tonull
instead ofundefined
to explicitly signal that it is absent and not just a typo somewhere? ==> Yes, done.rclnodejs.shutdown()
to shut down all existing contexts instead of only the default one. Is that intentional? Is there a will to change that? ==> Okay, I'm convinced to not change things here. We follow rclpy's semantics.NodeOptions.defaultOptions
inindex.js
instead of reference that method? I.e. append()
. ==> Nevermind, didn't know the getter syntax.EDIT:
Context
class to better align with that of rclpy. Is that OK? ==> Apparentlyexit()
be done? ==> Will be improved in #720