ami-iit / yarp-openmct

Repo for YARP and OpenMCT integration.
BSD 3-Clause "New" or "Revised" License
6 stars 1 forks source link

Closing some Yarp ports from Node.js explicitly fails #59

Open nunoguedelha opened 2 years ago

nunoguedelha commented 2 years ago

We try to close all the YARP ports opened by the iCubTelemetryServer process, by running the following commands from Node.js (for instance via REPL):

> this.yarp.portHandler.close('/yarpjs/inertial:i')
> this.yarp.portHandler.close('/yarpjs/left_leg/stateExt:o')
> this.yarp.portHandler.close('/yarpjs/camLeftEye:i')
> this.yarp.portHandler.close('/yarpjs/camRightEye:i')
> this.yarp.portHandler.close('/yarpjs/left_leg/cartesianEndEffectorWrench:i')
> this.yarp.portHandler.close('/yarpjs/right_leg/cartesianEndEffectorWrench:i')

The last command hangs. The port is actually closed but the Javascript binding call never returns the hand to the terminal).

nunoguedelha commented 2 years ago

The problem is specific to the port /yarpjs/right_leg/cartesianEndEffectorWrench:i as the failure occurs if it is the first we try to close.

It even crashes the terminal stdout.

nunoguedelha commented 2 years ago

Checks:

$ yarp exists /wholeBodyDynamics/right_leg/cartesianEndEffectorWrench:o /yarpjs/right_leg/cartesianEndEffectorWrench:i
[INFO] |yarp.os.Network| No connection from /wholeBodyDynamics/right_leg/cartesianEndEffectorWrench:o to /yarpjs/right_leg/cartesianEndEffectorWrench:i found

$ yarp name list | grep /yarpjs/right_leg/cartesianEndEffectorWrench
registration name /yarpjs/right_leg/cartesianEndEffectorWrench:i ip 192.168.1.18 port 10122 type tcp

For some unknown reason, calling directly port._close() works.

This is probably an old issue already happening when we were stopping the process with process.exit().

Analyzing...

nunoguedelha commented 2 years ago

I could not reproduce this issue anymore and didn't find anything relevant happening in https://github.com/robotology/yarp.js/blob/ceddda861b1b29478d3bc389d83765fed7297196/yarp.js#L231-L246.

So for now I'm closing the issue, until it happens again.

nunoguedelha commented 2 years ago

Reopened because it's frequently happening again when we are closing the telemetry server through a <CTRL+C> SIGINT.

nunoguedelha commented 2 years ago

On C++ bindings and C++ addons

yarp.js implementation uses a C++ addon (https://github.com/nodejs/worker/blob/master/doc/api/addons.md), YarpJS.node, a dynamically-linked shared object, written in C++, that can be found in the generated folder build/Release, and can be loaded into Node.js using the require() function: https://github.com/robotology/yarp.js/blob/b15c8d4a7668eb0cac06a6e3954bb975fcd23b12/yarp.js#L2

Node.js Addons are dynamically-linked shared objects, written in C++, that can be loaded into Node.js using the require() function, and used just as if they were an ordinary Node.js module. They are used primarily to provide an interface between JavaScript running in Node.js and C/C++ libraries.

The filename extension of the compiled Addon binary is .node (as opposed to .dll or .so). The require() function is written to look for files with the .node file extension and initialize those as dynamically-linked libraries.

The YarpJS.node addon is implemented using the Native Abstractions for Node.js (or nan) API.

Non-blocking operations requirement

(https://github.com/nodejs/worker/blob/master/doc/api/addons.md))

Addon authors are encouraged to think about how to avoid blocking the Node.js event loop with I/O or other time-intensive tasks by off-loading work via libuv to non-blocking system operations, worker threads or a custom use of libuv's threads.

The idea is to delegate the ports closure to one of those worker threads, within a limited time, i.e. stopping that thread after a reasonable timeout and running a "yarp clean" afterwards if necessary.

nunoguedelha commented 2 years ago

The problem is specific to the port /yarpjs/right_leg/cartesianEndEffectorWrench:i as the failure occurs if it is the first we try to close.

Actually, the issue occurs also with /yarpjs/left_leg/cartesianEndEffectorWrench:i.

All connections are properly closed, during the server closure breakdown tasks, in https://github.com/ami-iit/yarp-openmct/blob/5dfc365c2aa25eb660ac0ae3a0974162eae074e1/iCubTelemVizServer/terminationHandler.js#L83.

The issue occurs when the Node.js Event Loop thread calls the port closure through the following call tree:

_closeOnExit() --> _port.close() --> _port._close();

https://github.com/robotology/yarp.js/blob/b15c8d4a7668eb0cac06a6e3954bb975fcd23b12/yarp.js#L204-L208

https://github.com/robotology/yarp.js/blob/b15c8d4a7668eb0cac06a6e3954bb975fcd23b12/yarp.js#L233-L249

and the failure occurs when calling _port._close(): https://github.com/robotology/yarp.js/blob/b15c8d4a7668eb0cac06a6e3954bb975fcd23b12/yarp.js#L238:

The issue occurs in the YarpJS.node C++ addon and doesn't seem at all to be related to the code execution in yarp-openmct before that call.

Moving the issue to repository https://github.com/robotology/yarp.js.

nunoguedelha commented 2 years ago

Couldn't move the issue. There is some problem with the conditions required to do it. for moving an issue from repo A to repo B:

@traversaro, I guess that "owner" means maintainer/admin? fact: I can move issues between this repo and robotology-superbuild. Are you admin/maintainer of https://github.com/robotology/yarp.js?

nunoguedelha commented 2 years ago

Proceeding the analysis here...

The simplest and fastest way to avoid this issue while looking for a proper solution is to remove all-together the exit listener which closes the ports. As the Node.js process using the ports is fully stopped with the process.exit() command, the ports are left behind unused and unattached to any process. they can be cleared out by a manual yarp clean from a terminal, but they can also be left alone with no harm to other processes. The telemetry server application can actually be run again and open the same ports without any "address conflict" issue.

Workaround implemented in https://github.com/robotology/yarp.js/pull/27.

nunoguedelha commented 2 years ago

Will further analyze the issue and provide a fix later.