fieldrndservices / libssh2-labview

A LabVIEW library for SSH client support via libssh2
Apache License 2.0
21 stars 2 forks source link

Error -8146 when 1 minute without sending command to SSH channel #44

Closed rafaelfalcaro closed 2 years ago

rafaelfalcaro commented 2 years ago

Hi,

I'm working on a project where I do SSH communication between a computer (Windows 10) and a motion controller (Debian 8).

I'm having problems sending commands, after being 1 minute without sending any commands. It seems that there is a timeout configured because it is exactly 60 seconds the maximum time I can go without sending commands, so that the error does not happen.

This is the VI (snippet) I developed to simulate the problem:

SSH

Note that I forced a timeout greater than 60 to simulate the problem. Any value less than that, the error does not occur and I am able to send both commands.

The returned error is: -8146 : Field_RnD_Services_LIBSSH2_Toolkit.lvlib:Channel.lvclass:Write.vi[Socket Receive Error]

I've already tried setting a timeout with Write Timeout.vi, but I understand that this timeout refers to sending or receiving commands, not the connection itself:

image

@volks73 can you help me please?

volks73 commented 2 years ago

@rafaelfalcaro Sorry for the delay in responding. There is the optional Timeout (60000 ms) terminal for the Toolkit.lvlib:Session.lvclass:Connect.vi in the lower left of the icon. In your (excellent) snippets, this is unwired and matches the timeout you are observing. I thought this timeout was related to the length of time LabVIEW waits to create the TCP connection, so it would only apply to the Connect.vi, but the "raw" form of the LabVIEW connection is extracted and passed as a socket handle to the libssh2lv library and ultimately the libssh2 library for use. It is possible that this timeout information is carried with into the libssh2 library and used as the timeout for executing its various lower level SSH communication commands.

While I have not looked too deeply into the libssh2 library with respect to the socket handle usage and timeouts, I would try changing this value from 60000 to some other value and see if you still receive the -8146 error at the 60s interval. In LabVIEW, the default is -1, which never times out. I don't know what this would do to the libssh2 library if it uses the value for its own time out purposes, so I would try 30000 or even 10000 ms.

I've already tried setting a timeout with Write Timeout.vi, but I understand that this timeout refers to sending or receiving commands, not the connection itself:

This is my understanding as well, but this provides more evidence that the Connect.vi timeout is related to the connection.

rafaelfalcaro commented 2 years ago

@volks73 Thanks for the feedback!

I performed some tests and the VI timeout parameter "Connect" is related only to the connection opening time as we had imagined (the problem still happens, even putting -1 or 70000 ms as a parameter)

This time I tried to use an example VI from the library, it's "Execute Multiple Commands with a Single Channel". It works fine by default, but when putting the 65 second delay between two writing and reading commands, the error occurs in the same way.

image

It is clearly a timeout problem, since if I let a loop that sends the commands in the same way, with an interval of 10 seconds, I can operate for hours without any errors.

volks73 commented 2 years ago

@rafaelfalcaro The original error code, -8146 is an ERROR_SOCKET_RECV. The LabVIEW error handler (Check Status.vi) adds -8100 to errors. This is not an ERROR_TIMEOUT. I was thinking that there might be a bug with the Write Timeout.vi for the Session as there is no timeout for individual "read" or "write" SSH channel functions, but the original error code suggests it is not related to the timeout.

Since the original error is not a timeout error, can you use a timeout of 0 for the session? According to the libssh2 documentation, this will disable any timeouts and possibly isolate the issue. I am beginning to think your issue is related to some configuration with your SSH server. The libssh2 library has a libssh2_keepalive_config. By default, the keep alive is disabled (0). There is currently no support for this function in this LabVIEW toolkit. However, your SSH server may have a keep alive enabled. This is typically a configuration directive in a /etc/sshd_config file on the server (assuming a Linux/Unix remote SSH server). If possible, see if the keepalive is enabled on the server and disable it.

The original error code seems to suggest the remote SSH server is closing the connection, not the client/LabVIEW toolkit.

rafaelfalcaro commented 2 years ago

Hi, @volks73. Thanks for the explanations!

I've tried to use a timeout of 0 and that caused opening the connection to throw error 56, which confirms that this timeout parameter has to do with the opening of the connection and not with the maximum interval between commands in the SSH channel.

About the "keep alive" configuration, I checked the file in the controller:

image

I don't think the problem is on the controller side, as I can keep an SSH session active for hours when using the Windows terminal.

volks73 commented 2 years ago

@rafaelfalcaro Thanks for the information and confirmation. I think it is still related to something with the keep alive. Doing some research and reading, there is the "How does tcp-keepalive work in SSH" StackOverflow Q&A that seems to indicate there are different keep alive options, configurations, and defaults.

Reading and writing on a channel in regular intervals like you have implemented and experimentally determined is essentially a "keep alive" mechanism. Combined with some of the information from the Q&A, enabling the keep alive is actually the appropriate action and would eliminate your manual implementation.

It should be possible to look at the Windows Terminal SSH client configuration, too. I believe it is OpenSSH and there should be a ssh_config somewhere on your system and have a client-side "keep alive" configuration directives. Using the ssh -vvvv with the Windows Terminal SSH client might also print out debugging information that can indicate a possible difference between the Windows Terminal SSH client and this libssh2-based toolkit.

volks73 commented 2 years ago

@rafaelfalcaro Have you been able to make any progress on this?

rafaelfalcaro commented 2 years ago

Hi @volks73, sorry for not posting anything else here, I've been busy with other parts of this project given the interim solution of implementing a keepalive.

Thank you so much for charging me!!!

I did the KeepAlive configuration on my SSH server (/etc/ssh/sshd_config) and the problem was solved!!!

Here's the configuration:

TCPKeepAlive no
ClientAliveCountMax 240
ClientAliveInterval 30
volks73 commented 2 years ago

@rafaelfalcaro, No worries. It is great to hear that you were able to resolve the issue. I am going to close this issue as it appears no modifications to the toolkit or further action is needed. Thank you.