CESNET / netopeer2

NETCONF toolset
BSD 3-Clause "New" or "Revised" License
299 stars 188 forks source link

cli_send_recv() encounters timeout during multiple thread accessing. #1398

Closed rexlim820220 closed 1 year ago

rexlim820220 commented 1 year ago

1.[Title fromat]

static int cli_send_recv() in commands.c encounters timeout for sending the RPC expired error during multiple thread accessing.

2.[Issue description]

Our program forks two processes to independently call commands API that triggers cli_send_recv() such as cmd_editconfig or cmd_getconfig to interact with the same NETCONF server session. There were sometimes(not always) possibilities for the caller to show the following error message:

3.[issue log]

APP_1       | cli_send_recv: Timeout for receiving a reply expired.
APP_1       | [error][APP] fail to editconfig test.xml to NETCONF SERVER
APP_1       | nc ERROR: Session 2: received a <rpc-reply> with an unexpected message-id "96".
APP_1       | cli_send_recv: Unexpected reply received - ignoring and waiting for the correct reply.

4.[our validation]

In terms of consequentialism, NETCONF server had received correct configuration. However, the log shows cli_send_recv: Timeout for receiving a reply expired. which is a bit ambiguous to us.

In the first timeout case, cli_send_recv() return -1 since the message type created from session is NC_MSG_WOULDBLOCK. In Miscellaneous macros types provided by libnetconf2, it represents a timeout return value. While in the second case(Unexpected reply received), the session replies NC_MSG_REPLY_ERR_MSGID, which means message with missing or wrong message-id attribute value.

After tracing the code, one assumption ran into us that the session argument in nc_send_rpc() is a global variable, which is defined in line 73. We wish to contribute a pull request which add pthread_mutex_lock in commands.c to prevent session from being referred too rapidly during multi-threading and thus cause the issue.

michalvasko commented 1 year ago

netopeer2-cli is a (mostly, ignoring notifications) single-threaded application so I am not sure why calling an internal function in multiple threads is supposed to work. Similarly, you can make whatever changes you want but do not create a pull request, there is nothing to be fixed in the CLI. It is generally meant to be a usable CLI so that no 3rd party clients are needed to perform basic tasks. However, if you want a full-fledged NETCONF client, you should either use some other implementation (there are lots) or write your own using libnetconf2. Our CLI is limited in some ways, especially as far as threads are considered.