fieldrndservices / libssh2-labview

A LabVIEW library for SSH client support via libssh2
Apache License 2.0
22 stars 2 forks source link

Create from SCP Receive.vi: "Memory Allocation Error - Unable to allocate memory, most likely the system is out of memory." #36

Closed ngblume closed 3 years ago

ngblume commented 3 years ago

When using the "Create from SCP Receive.vi" (for example within the SCP Download example), I get a "Memory Allocation Error" (-8102) at the Call Library Node (Function: lv_libssh2_scp_receive). The problem is that this occurs only with some servers, not all and not at all times. Working locally with an RPi works 100% of the time without problems. Working with 2 other servers, I get this error 8 out of 10 times (roughly).

  1. How can I fix this issue?
  2. Is this a problem with the memory on my client machine or on the server? My understanding would be server-side since it oocurs with some machines, but not others.. But the help of the "Create from SCP Receive.vi" sounds as if this happens locally on the client..

Thanks !

Cheers Niels

volks73 commented 3 years ago

The Create from SCP Receive.vi is a "wrapper" around the lv_libssh2_scp_receive function from the libssh2lv project. When I look at the source code for the lv_libssh2_scp_receive function, it is a "thin wrapper" around the libssh2_scp_recv2 function and there are lines 91 to 95:

LIBSSH2_CHANNEL* inner = libssh2_scp_recv2(session->inner, path, file_info->inner);
if (inner == NULL) {
    return LV_LIBSSH2_STATUS_ERROR_MALLOC;
}

However, according to the documentation for the libssh2_scp_recv2 function:

Pointer to a newly allocated LIBSSH2_CHANNEL instance, or NULL on errors.

and there are three possible errors: LIBSSH2_ERROR_ALLOC, LIBSSH2_ERROR_SCP_PROTOCOL, and LIBSSH2_ERROR_EAGAIN.

The above C wrapper code in the lv_libssh2_scp_receive function is "masking" all errors from the libssh2_scp_recv2 function on a NULL value as a memory allocation error, when it could be a memory allocation error, a SCP Protocol error from the server, and a warning for non-blocking IO but the call would block (The LIBSSH2_ERROR_EAGAIN is not actually an error, it is used with non-blocking IO). This is a bug in the error handling within the libssh2lv C code. If the libssh2_scp_recv2 function returns NULL, then the error code should be checked and the appropriate error should be returned.

I will add an issue to the libssh2lv project to update the error handling. However, this is not the cause of your (@ngblume) issue but a complicating factor which is indicating a memory allocation error but really it is most likely a SCP Protocol error based on your description of conditions when running with two other servers. Unfortunately, the HTML documentation for the libssh2_scp_recv2() function for the LIBSSH2_ERROR_SCP_PROTOCOL error is empty and does provide any details on what could cause this error.

If it is in fact a LIBSSH2_ERROR_SCP_PROTOCOL error, then this is most likely something with the servers and not the client. I have had problems with newer SSH servers, but I believe that was related to supported cryptography algorithms, not with SCP protocol. The SCP protocol has been around for a long time and it does not change often or frequently.

It is also possible that the error is a LIBSSH2_ERROR_EAGAIN warning/error. Are you using non-blocking IO (Write Mode VI in the Channel API)?

Some additional debugging:

  1. Are you able to use the scp command from the command line, external to LabVIEW and this package?
  2. You mentioned a Raspberry Pi. Can you try the Read-Execute-Print-Loop with a Raspberry Pi example?
  3. Can you try the Read-Execute-Print-Loop with a Raspberry Pi example but connect to the two other servers? It should work, there is nothing really special about the LabVIEW code that makes it only work with a Raspberry Pi. I just get a lot of requests to interface with a Raspberry Pi via SSH with LabVIEW using this package.
  4. What type (OpenSSH?) and version of SSH server is running on the 2 problematic servers? Is this the same as the Raspberry Pi? If not, then there is something different between the SSH servers that is causing the SCP protocol error.
  5. Can you install or use the server and version on the Raspberry Pi on the other servers?
  6. Can you try the scp -vvv command from the command line with the three different servers and observe the output? The output might indicate where the three servers (Raspberry Pi and the other two) might differ.
ngblume commented 3 years ago

The above C wrapper code in the lv_libssh2_scp_receive function is "masking" all errors from the libssh2_scp_recv2 function on a NULL value as a memory allocation error, when it could be a memory allocation error, a SCP Protocol error from the server, and a warning for non-blocking IO but the call would block (The LIBSSH2_ERROR_EAGAIN is not actually an error, it is used with non-blocking IO). This is a bug in the error handling within the libssh2lv C code. If the libssh2_scp_recv2 function returns NULL, then the error code should be checked and the appropriate error should be returned.

If I understand you correctly, I'm not necessarily chasing a memory allocation issue, but it might be one of three options, right?

  1. memory allocation error,
  2. a SCP Protocol error from the server
  3. a warning for non-blocking IO

Would a memory error nevertheless be on server or client-side?

Assuming, and I think your assumption here is justified, that my error is actual a "LIBSSH2_ERROR_SCP_PROTOCOL" error, I will try looking into the SSH logs on the servers in more detail. As of now, I was only able to find the connection, authentification and disconnect log entries (Doing disconnect even if transfer fails might be something good for examples as well, because otherwise the server needs to cleanup the left open sessions > reported as new issue #39 for improvement).

Regarding error "LIBSSH2_ERROR_EAGAIN": I just used the example, which would be a no for non-blocking IO...

Regarding your questions:

  1. yes, for example with WinSCP
  2. The RPi is the one working perfectly every time... (I'll see to try the exampe as soon as possible..)
  3. Not sure, what you mean here
  4. Server Versions
    • RPi: "Remote protocol version 2.0, remote software version OpenSSH_7.4p1 Raspbian-10+deb9u7 // debug1: match: OpenSSH_7.4p1 Raspbian-10+deb9u7 pat OpenSSH* compat 0x04000000" (based on "ssh -v localhost")
    • server 1: "Remote protocol version 2.0, remote software version OpenSSH_8.2p1 Ubuntu-4ubuntu0.2 // debug1: match: OpenSSH_8.2p1 Ubuntu-4ubuntu0.2 pat OpenSSH* compat 0x04000000" (based on "ssh -v localhost")
    • server 2: "Remote protocol version 2.0, remote software version OpenSSH_7.4 // debug1: match: OpenSSH_7.4 pat OpenSSH* compat 0x04000000" (based on "ssh -v localhost")
  5. Need to check with IT
  6. CMD on rpi is not recognized / accepted.. will try other server next..

Thanks for the help! Will report with more debugging soon...

Cheers Niels Göran

ngblume commented 3 years ago

Debugging-Addition:

volks73 commented 3 years ago
  1. Can you try the Read-Execute-Print-Loop with a Raspberry Pi example but connect to the two other servers? It should work, there is nothing really special about the LabVIEW code that makes it only work with a Raspberry Pi. I just get a lot of requests to interface with a Raspberry Pi via SSH with LabVIEW using this package.
  2. Not sure, what you mean here

There is the Read-Execute-Print-Loop with a Raspberry Pi example. It may not be available in the current release on VIPM.io, but you should be able to clone this repository's main branch, i.e. git clone https://github.com/fieldrndservices/libssh2-labview.git, to get the example. While the title/name of the example is "...with a Raspberry Pi", the REPL can be used with any remote SSH server. This would test and determine if the issue is specific to the SCP-related functions/VIs.

Thank you for the version output. To summarizes and confirm:

Remote Host SSH Server Version Connectivity Works? (%)
RPi OpenSSH v7.4 LAN 100
Server 1 OpenSSH v8.2 VPN 20
Server 2 OpenSSH v7.4 VPN 20
Server 3 OpenSSH v5.9 VPN 0

What is the client host, a desktop/laptop, RPi, PXI controller, or cRIO?

  • only the local server worked so far.. Other were remote or via VPN Could this have someting to do with the issue ?

By "local" do you mean a Local Area Network (LAN), i.e. sitting next to the client host connected either by a (managed) switch or cross-over cable, and by "remote" you mean a Wide Area Network (WAN), possibly even in a different office, building, city, country, etc., and must go through your organization's IT-managed network to the Internet?

I am trying to narrow down if your issue is specific to the SCP command, the LabVIEW code, the libssh2 C library, or a network issue with your servers. While this package is far from perfect and I greatly appreciate the bug notifications and feature request from you, I am leaning towards a network/connectivity issue with your servers and the VPN might have something to with it. The difficulty with remote/VPN servers and the delays all indicate some network configuration issue rather than something amiss with the code. Are you using IP addresses for the host addresses or hostnames or Fully-Qualified Domain Names (FQDN)? For example, if connecting from the client with the command line tools,

# IP addresses
ssh -vvv niels@192.168.1.5

# Hostnames/mDNS
ssh -vvv niels@raspberrypi.local
# or
ssh -vvv niels@raspberrypi

# FQDNs
ssh -vvv niels@your.organization.com

Is IPv6 enabled and being used instead of IPv4? Sometimes I have had issues if IPv6 is enabled with some embedded hardware and network connectivity.

There is an overall trend to move away from SCP and use SFTP instead. See (i) What's the difference between SFTP, SCP, and FISH protocols and (ii) OpenSSH 8.0 Release Notes. For your use case, can you use SFTP instead of SCP? The LIBSSH2-LabVIEW package supports the entire SFTP API implemented in the libssh2 library. There are examples for transferring files using SFTP:

  1. Simple SFTP File Download
  2. Simple SFTP File Upload

I believe SFTP uses the same port as SSH and the OpenSSH server does support the SFTP protocol. You should not have to change anything on the remote servers or enable through your IT department. If the SFTP-related Examples and/or VIs work with the remote servers, then it is something with the SCP-related implementation. However, if the SFTP-related Examples and/or VIs yield similar results, then this would be more evidence that it is something with the connectivity/networking/VPN.

ngblume commented 3 years ago
  1. Can you try the Read-Execute-Print-Loop with a Raspberry Pi example but connect to the two other servers? It should work, there is nothing really special about the LabVIEW code that makes it only work with a Raspberry Pi. I just get a lot of requests to interface with a Raspberry Pi via SSH with LabVIEW using this package.
  1. Not sure, what you mean here

There is the Read-Execute-Print-Loop with a Raspberry Pi example. It may not be available in the current release on VIPM.io, but you should be able to clone this repository's main branch, i.e. git clone https://github.com/fieldrndservices/libssh2-labview.git, to get the example. While the title/name of the example is "...with a Raspberry Pi", the REPL can be used with any remote SSH server. This would test and determine if the issue is specific to the SCP-related functions/VIs.

I will try to look at it more closely, but it seems, that the example pretty much does what I included in my test VI already regarding commands / buffers passed back and forth. image

Thank you for the version output. To summarizes and confirm:

Remote Host SSH Server Version Connectivity Works? (%) RPi OpenSSH v7.4 LAN 100 Server 1 OpenSSH v8.2 VPN 20 Server 2 OpenSSH v7.4 VPN 20 Server 3 OpenSSH v5.9 VPN 0 What is the client host, a desktop/laptop, RPi, PXI controller, or cRIO?

The client is a Windows host, the targets are some servers (1 RPi, rest: rack server)...

  • only the local server worked so far.. Other were remote or via VPN Could this have someting to do with the issue ?

By "local" do you mean a Local Area Network (LAN), i.e. sitting next to the client host connected either by a (managed) switch or cross-over cable, and by "remote" you mean a Wide Area Network (WAN), possibly even in a different office, building, city, country, etc., and must go through your organization's IT-managed network to the Internet?

yes, local means LAN across my WiFi router... The rest were servers in the internet (private server) or in the company network via VPN

I am trying to narrow down if your issue is specific to the SCP command, the LabVIEW code, the libssh2 C library, or a network issue with your servers. While this package is far from perfect and I greatly appreciate the bug notifications and feature request from you, I am leaning towards a network/connectivity issue with your servers and the VPN might have something to with it. The difficulty with remote/VPN servers and the delays all indicate some network configuration issue rather than something amiss with the code. Are you using IP addresses for the host addresses or hostnames or Fully-Qualified Domain Names (FQDN)? For example, if connecting from the client with the command line tools,

I tried FQDN, as well as IPs.. I am trying to do the same by testing almost every combination... I have a similar feeling regarding the connectivity.. but I somehow considered the various servers with identical issues somewhat as a pointer towards the lib in combination with connectivity...

Is IPv6 enabled and being used instead of IPv4? Sometimes I have had issues if IPv6 is enabled with some embedded hardware and network connectivity.

not sure, need to check... but I think the LAN is IPv6 (mostly) and WAN ist mostly IPv4..

I believe SFTP uses the same port as SSH and the OpenSSH server does support the SFTP protocol. You should not have to change anything on the remote servers or enable through your IT department. If the SFTP-related Examples and/or VIs work with the remote servers, then it is something with the SCP-related implementation. However, if the SFTP-related Examples and/or VIs yield similar results, then this would be more evidence that it is something with the connectivity/networking/VPN.

Sounds like a good idea.. Will try SFTP and see what happens...

Update: Trying SFTP right now, and I'm missing the SFTP -Read All.vi. It is in the newest commit on GitHub, but I can't seem to find it in the package installed via VI Package Manager (V1.1.1.25). Is it missing there? image

Is there a way to change the underlying lib to a more "chatty" one that might give a more detailled view as what to might be wrong? I'm yet not able to find proper logs for SCP on any of my servers... > any hint here as to what might be worth looking at?

Thanks !

Cheers Niels

volks73 commented 3 years ago

It is in the newest commit on GitHub, but I can't seem to find it in the package installed via VI Package Manager (V1.1.1.25). Is it missing there?

It is missing from v1.1.1, but it is available in the main branch. Looks like it was added as of afa6722edb698d4e3beec93c9aa4ec3d8002ce70, but that commit has not been released. I really should make a new release.

Is there a way to change the underlying lib to a more "chatty" one that might give a more detailled view as what to might be wrong? I'm yet not able to find proper logs for SCP on any of my servers... > any hint here as to what might be worth looking at?

There is the libssh2_trace() in the libssh2.org API, but it appears I have not wrapped it and exposed it for the libssh2lv project. I also don't think the libssh2lv library is built with tracing enabled. It is disabled by default. I have created an issue in the libssh2lv project to add the functionality, which can then be added to the LabVIEW API.

ngblume commented 3 years ago

Hello,

it seems like the issues are caused by the text received on the terminal after establishing SSH connection with the server. I tested another LabVIEW SSH library and encountered identical problems and working servers, while the same servers do not work.I was able to fix my issue with the servers with the other toolkit, when I read what ever showed up on the shell after login, before establishing an SCP connection. image

I tried similar things with your lib, but I think I missed some information here... I looked at some of the examples, like "Execute Multiple Commands with a Single Channel with a Raspberry Pi.vi" and "Single Command Execution.vi". Here is what I tried with a proper timeout (5000 ms): image

It increased the reliability to roughly 20%. The other times, there were delays and probably errors that caused SCP to throw the memory allocation error again around the highlighted part in the following picture: image

Have you encountered any issues related to this or have an idea what to perform to just read all the welcome message (I copied the basic structure from "Execute Multiple Commands with a Single Channel with a Raspberry Pi.vi" and increased the time-out) ?

Thanks !

volks73 commented 3 years ago

Have you encountered any issues related to this or have an idea what to perform to just read all the welcome message (I copied the basic structure from "Execute Multiple Commands with a Single Channel with a Raspberry Pi.vi" and increased the time-out) ?

Yes, I think I have encountered this problem before. Please see #30. While it is quite long, towards the end is some information about the splash screen contents being displayed when connecting, i.e. the welcome message (https://github.com/fieldrndservices/libssh2-labview/issues/30#issuecomment-753009577). There is not a specific resolution to the issue, but it sounds like it is the exact same problem you are encountering. If it is the same problem, then I wonder if you could disable the splash/welcome message on the SSH servers? This way you would avoid having to read the variable/unknown length message before starting and needing a timeout?

volks73 commented 3 years ago

@ngblume I have created a new release, v1.2.0. The new release should include many of the fixes mentioned in this issue and other places, but I don't think it directly resolves this issue. It does include the new shared libraries with the fix for appropriate error messages/codes, so maybe you can install the new release and see if the error code is in fact a SCP protocol error instead of the Out-of-Memory (OOM) error.

ngblume commented 3 years ago

@volks73 Just downloaded it, will install and report back...

Cheers Niels

ngblume commented 3 years ago

@volks73 I'm having major issue with the new version, so maybe I'm doing somehting fundamentally wrong... Since installing the VIPM from GitHub, I get Error 56 in LabVIEW during all attempts to connect to a server with the vi previously working. The error originates from "Connect.vi" and has the following details: Error - 56: "TCP Open Connection in Field_RnD_Services_LIBSSH2_Toolkit.lvlib:Session.lvclass:Connect.vi->libssh2_SCP_CMD.vi"

Within Connect.vi: It originates from "TCP Open Connection". "Error 56 occurred at TCP Open Connection in Field_RnD_Services_LIBSSH2_Toolkit.lvlib:Session.lvclass:Connect.vi->libssh2_SCP_CMD.vi Possible reason(s): LabVIEW: (Hex 0x38) The network operation exceeded the user-specified or system time limit."

For now, I will revert to previous version (V1.1.1.25) and check if ti works there as before. EDIT: Revert solved issue completely, so I don't think that the cause of this error is within my application, but rather the lib. WinSSP works perfectly fine with the same server..

Anything I'm missing here?

Cheers Niels

volks73 commented 3 years ago

@ngblume The Error 56 is caused by the Timeout (60000 ms) optional terminal on the Session:Connect.vi defaulted to 0 ms instead of the 60000 ms. After changing this value to 60000 ms and/or hard-wiring a constant to it, the Error 56 goes away. I will fix and create a bug fix release in a moment. However, I get an Error 1097. I will need to further investigate the Error 1097. If you hard-code a numeric constant to the Timeout optional input, do you also get an Error 1097?

volks73 commented 3 years ago

I just fixed Error 56 with 895b6fcae7d6486759d0713fd62c656f319d1e7a.

The Error 1097 is difficult to resolve at the moment. It appears to occur after running the VIs a second time. The first time I try an example or test, everything works great. When I re-run the same Example or Test, I will get the Error 1097, which is related to the Call Library Function node and the external shared library (DLL on Windows). There might be something wrong with the shared library (v0.2.2).

ngblume commented 3 years ago

@volks73 Thank you for the bugfix version 1.2.1... The connection can now be established at least once. I'm also running into the 1097 error calling the Connect.vi the second time. It is not the first VI, that is called a second time. Initialize.vi and Create.vi execute without issues also the second time, and do calls to the DLL... This seems to be related to the "lv_libssh2_session_connect" function in particular...

On the other hand, if I change servers between calls, the error is gone and replaced by the error 56 again.... But this is also not fully reproducable... Disregard.. That was actually caused by not being able to reach the server...

=== LabVIEW crash report:

<DEBUG_OUTPUT>
19.10.2021 00:44:13.428
DWarn 0x0E697B77: Caught exception in ExtCode call!
E:\builds\2020patch\source\execsupp\ExtFuncRunTime.cpp(90) : DWarn 0x0E697B77: Caught exception in ExtCode call!
minidump id: ad8b3fdf-fc9d-4de6-b897-b9063172a58c
$Id: //labview/branches/2020patch/dev/source/execsupp/ExtFuncRunTime.cpp#1 $

</DEBUG_OUTPUT>
0x0045C479 - LabVIEW <unknown> + 0
0x5E94FB09 - mgcore_SH_20_0 <unknown> + 0
0x5E95090C - mgcore_SH_20_0 <unknown> + 0
0x0038E675 - LabVIEW <unknown> + 0
0x0038FFC3 - LabVIEW <unknown> + 0
0x34E70C55 - <unknown> <unknown> + 0
0x017F1B24 - LabVIEW <unknown> + 0
0x5E99F5F5 - mgcore_SH_20_0 <unknown> + 0
0x7585FA29 - KERNEL32 <unknown> + 0
0x772B7A9E - ntdll <unknown> + 0
0x772B7A6E - ntdll <unknown> + 0
0x00000000 - <unknown> <unknown> + 0

<DEBUG_OUTPUT>
19.10.2021 00:44:14.806
DWarn 0x50CBD7C1: Got corruption with error 1097 calling library libssh2lv.* function lv_libssh2_session_connect
E:\builds\2020patch\source\execsupp\ExtFuncRunTime.cpp(273) : DWarn 0x50CBD7C1: Got corruption with error 1097 calling library libssh2lv.* function lv_libssh2_session_connect
minidump id: 0e28576d-972a-4fa6-807b-769768da25a8
$Id: //labview/branches/2020patch/dev/source/execsupp/ExtFuncRunTime.cpp#1 $

</DEBUG_OUTPUT>
0x0045C479 - LabVIEW <unknown> + 0
0x5E94FB09 - mgcore_SH_20_0 <unknown> + 0
0x5E95090C - mgcore_SH_20_0 <unknown> + 0
0x34E70C89 - <unknown> <unknown> + 0
0x017F1B24 - LabVIEW <unknown> + 0
0x5E99F5F5 - mgcore_SH_20_0 <unknown> + 0
0x7585FA29 - KERNEL32 <unknown> + 0
0x772B7A9E - ntdll <unknown> + 0
0x772B7A6E - ntdll <unknown> + 0
0x00000000 - <unknown> <unknown> + 0
Full path to library is C:\Program Files (x86)\National Instruments\LabVIEW 2020\vi.lib\Field R&D Services\LIBSSH2 for LabVIEW\Toolkit\Support\libssh2lv.*
ExtFuncDWarnOnCorruption: connectorTDR is 
_TDR(1/1):@0x0000: 'lv_libssh2_session_connect': cluster of 3 elements

    @0x0000: 'result': long [32-bit integer (-2147483648 to 2147483647)]

    @0x0004: 'session': unsigned quad [64-bit integer (0 to approx 2e19)]

    @0x000C: 'socket': unsigned quad [64-bit integer (0 to approx 2e19)]

===

Not very familiar with the debug output, but it seems that there is an issue with the connector.. Or that is just how it is reported...

It might also be an option that the Disconnect.vi is actually causing the problem by leaving some stuff "unclosed"...

Cheers Niels

ngblume commented 3 years ago

@volks73 Aside from the 1097 error the second time calling Connect.vi, the issue of the memory allocation error moved to a different error now.. It is reported as a timeout.. IF I do not remotely press LF (3x) and obtain the bytes from the console output before trying to establish a SCP connection....

This means for me:

  1. Issue is caused by server / server console output ...
  2. Not handling what the server does upon connecting causes the SCP initialization to timeout... probably, since the command to initialize SCP is simply not being recognized as such by the server since it gets mixed up with other stuff during connecting..

I think the memory allocation error issue is resolved, apart from the 1097 error, which kind of makes the lib hardly usable at the moment...

Cheers Niels

volks73 commented 3 years ago

@ngblume Well, I am happy that we were able to figure out the original issue. I am going to close this issue and we can continue discussion about the Error 1097 in Issue #40.