ansible / pylibssh

Python bindings specific to Ansible use case for libssh https://www.libssh.org/
https://ansible-pylibssh.rtfd.io
GNU Lesser General Public License v2.1
59 stars 29 forks source link

Fix reading large files over SCP #621

Closed Jakuje closed 3 months ago

Jakuje commented 3 months ago
SUMMARY

The current SCP code does not handle large reads. Instead just allocates the large buffer fills up to 64k bytes and then writes the whole to the file (most likely the rest filled with junk).

The libssh returns the number of read bytes from ssh_scp_read() and for practical reasons does not allocate the whole file in memory. The reads are capped to 64k B.

This PR fixes the read by both not allocating the entire file size in memory and reading it by chunks (that much max libssh chunk atm) and writing that to the local file.

ISSUE TYPE
ADDITIONAL INFORMATION

The attached test case is failing without the change, with errors like this:

_______________________________________________________________________ test_get_large _______________________________________________________________________
[gw0] linux -- Python 3.12.3 /home/jjelen/devel/pylibssh/.tox/python/bin/python

dst_path = PosixPath('/tmp/pytest-of-jjelen/pytest-35/popen-gw0/test_get_large0/dst-file.txt')
src_path_large = PosixPath('/tmp/pytest-of-jjelen/pytest-35/popen-gw0/test_get_large0/large.txt'), ssh_scp = <pylibsshext.scp.SCP object at 0x7f3f83cffac0>
large_payload = b'&PE~~,!=1@F7RaUccg#)q-ru^~f8Zv/-{~tFE-\\>v2_b}i@`N/]_%*~{oD`"F`*4Ns]HK^lewILvC1O[]gFC9(27%MuX$s&24et<dRJ$"& zf_XIqm7...**<J_;?uI47vm\'/A"xi5a!kQXoL=M}ne;$E[B=EpasVbRO&\\zy7t,GK yu;l*$s<#P:sJLi@od#:&:m?8]9KFN4@QAF8hD, ~T@WNP;)%+>)TV\'ODi9'

    def test_get_large(dst_path, src_path_large, ssh_scp, large_payload):
        """Check that SCP file download works also for large files."""
>       ssh_scp.get(str(src_path_large), str(dst_path))

dst_path   = PosixPath('/tmp/pytest-of-jjelen/pytest-35/popen-gw0/test_get_large0/dst-file.txt')
large_payload = (b'&PE~~,!=1@F7RaUccg#)q-ru^~f8Zv/-{~tFE-\\>v2_b}i@`N/]_%*~{oD`"F`*4Ns]HK^le'
[...]
 b"?8]9KFN4@QAF8hD, ~T@WNP;)%+>)TV'ODi9")
src_path_large = PosixPath('/tmp/pytest-of-jjelen/pytest-35/popen-gw0/test_get_large0/large.txt')
ssh_scp    = <pylibsshext.scp.SCP object at 0x7f3f83cffac0>

tests/unit/scp_test.py:111: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   raise LibsshSCPException("Unexpected request: %s" % self._get_ssh_error_str())
E   pylibsshext.errors.LibsshSCPException: Unexpected request: b'ssh_scp_pull_request called under invalid state'

src/pylibsshext/scp.pyx:148: LibsshSCPException

(lines do not match)

Additionally, there is one more test case, that I suspected will be problematic, but it turned out it works just ok, so leaving for the coverage.

packit-as-a-service[bot] commented 3 months ago

Congratulations! One of the builds has completed. :champagne:

You can install the built RPMs by following these steps:

Please note that the RPMs should be used only in a testing environment.

webknjaz commented 3 months ago

@Jakuje FYI the linting failures block this PR

Jakuje commented 3 months ago

The suggestions accepted, the linting should pass now.

webknjaz commented 3 months ago

@Jakuje hey, so I added a few improvements on top and posted a few more comments inline. I've now stopped pushing to the PR branch so feel free to clean it up/rebase/make more changes.

webknjaz commented 3 months ago

This needs conflict resolution now.

Jakuje commented 3 months ago

Resolved the conflict and dropped the check for too large replies.

webknjaz commented 3 months ago

Looks like one of the CI jobs got stuck with https://github.com/ansible/pylibssh/issues/557 so I restarted it: https://github.com/ansible/pylibssh/actions/runs/9699687288/job/26769948869?pr=621#step:15:152.

webknjaz commented 3 months ago

@Jakuje this has been released as a part of v1.2.2: https://github.com/ansible/pylibssh/releases/tag/v1.2.2 / https://github.com/ansible/pylibssh/discussions/626 / https://pypi.org/project/ansible-pylibssh/1.2.2/.

Enjoy! 🎉