Closed PhenomPBG closed 3 years ago
Can you upgrade to 0.18.0 and check? Probably worth retesting once 0.19.0 is released also. Not likely to be issue with this library, the bindings are calling the underlying libssh2 C function, and 0.17.0 is an old version.
Confirmed as bug.
The library code seems to be hanging getting bytes_written
back from libssh2 in small chunks (1024k or so) after the file has been fully written.
This results in a hot loop while bytes_written
catches up to total_size
, mean while file is fully written. As the return code from libssh2 is > 0 and we still do not know the total number of bytes written, the loop never exits. Will aim to reproduce in C code as this may be a bug in libssh2.
Thanks pkittenis, this is the version of libssh2 that I have installed:
Installed Packages Name : libssh2 Arch : x86_64 Version : 1.4.3 Release : 12.el7_6.2
Will upgrade libssh2 and test again.
Precisely the same behaviour with libssh2 1.8.0 and ssh2-python 0.18.0.
Will test with libssh2 1.9.0 as well.
Is this issue still valid? I just uploaded an archive file > 5 GB without any issues. OS: Ubuntu 20.04.1 libssh2: 1.8.0-2.1build1 Python: 3.8.5 ssh2-python: 0.23.0
method:
mode = LIBSSH2_SFTP_S_IRUSR | \
LIBSSH2_SFTP_S_IWUSR | \
LIBSSH2_SFTP_S_IRGRP | \
LIBSSH2_SFTP_S_IROTH
f_flags = LIBSSH2_FXF_CREAT | LIBSSH2_FXF_WRITE
with open("local_file", "rb") as lofi:
with connection.open("remote_file", f_flags, mode) as remfi:
while True:
data = lofi.read(1024 * 1024)
if not data:
break
else:
_, sz = remfi.write(data)
[...]
Not sure, have seen something similar with libssh2 1.9 but not tested extensively.
In my testing this happens with really large single writes (> 4GB) to the remote file, smaller writes do not seem to behave in this way.
Ah! Writing one chunk of data with the size of 4GB and more in one go? My test case was a single 5GB file read and written in smaller portions of data.
Might be some limitation of SSH2.
But hey, who would keep a file that big in memory? That's a design problem from my point of view...
Any way, I'll also try and take a look at the code.
For file transfers it would be rather silly, but not all cases are that simple :)
Currently working around it by wrapping the internal buffer and treating it as a file. The behaviour is just peculiar.
From what I saw libssh2 buffers large writes internally in chunks and returns many smaller bytes written for an already completed large write.
For a 4GB write, the loop getting small bytes written up to 4GB is very long. The library should ideally handle the bytes written libssh2 code in C so it does not result in a hot loop in python code.
Library is already handling all bytes written by libssh2 code in C - without the GIL no less.
libssh2 seems to write in a max of 30MB chunks (MAX_SFTP_OUTGOING_SIZE
in libssh2).
Given a 4GB buffer, it takes some time to call libssh2_sftp_write
1500 times and for write
to finally return. The file is not guaranteed written until all writes have been ack-ed by server, even if the file size is correct.
See libssh2 sftp write documentation.
Not much more the library can do until this chunking behaviour can be disabled in libssh2.
Buffer your writes.
Steps to reproduce:
import socket import ssh2 import ssh2.exceptions from ssh2.session import Session from ssh2.utils import wait_socket from ssh2.error_codes import LIBSSH2_ERROR_EAGAIN from ssh2.sftp import LIBSSH2_FXF_READ, LIBSSH2_SFTP_S_IFDIR from ssh2.sftp import LIBSSH2_FXF_CREAT, LIBSSH2_FXF_WRITE, LIBSSH2_SFTP_S_IRUSR, LIBSSH2_SFTP_S_IRGRP, \ LIBSSH2_SFTP_S_IWUSR, LIBSSH2_SFTP_S_IROTH, LIBSSH2_SFTP_S_IRWXU, LIBSSH2_FXF_TRUNC
host = "localhost" port = 22 username = "test" password = "test123" sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect((host, port)) session = Session() session.handshake(sock) session.userauth_password(username, password) sftp = session.sftp_init()
mode = LIBSSH2_SFTP_S_IRUSR | LIBSSH2_SFTP_S_IWUSR | LIBSSH2_SFTP_S_IRGRP | LIBSSH2_SFTP_S_IROTH f_flags = LIBSSH2_FXF_CREAT | LIBSSH2_FXF_WRITE | LIBSSH2_FXF_TRUNC
file_data = bytes(4522932365) # ~4.3GB of data
with sftp.open("test", f_flags, mode) as remote_fh: remote_fh.write(file_data)
File "/opt/python3/lib/python3.7/site-packages/protocol_kit/sftp.py", line 313, in write with self.__sftp.open(path, f_flags, mode) as remote_fh: File "ssh2/sftp.pyx", line 230, in ssh2.sftp.SFTP.open File "ssh2/utils.pyx", line 157, in ssh2.utils.handle_error_codes ssh2.exceptions.SFTPProtocolError
During handling of the above exception, another exception occurred:
File "/opt/python3/lib/python3.7/site-packages/protocol_kit/sftp.py", line 314, in write remote_fh.write(file_data) File "ssh2/sftp_handle.pyx", line 291, in ssh2.sftp_handle.SFTPHandle.write File "ssh2/utils.pyx", line 179, in ssh2.utils.handle_error_codes ssh2.exceptions.SocketRecvError