Closed randr97 closed 2 years ago
Maybe we should consider rewriting the hook using Twisted? It sems all APIs needed are there https://twistedmatrix.com/documents/current/api/twisted.conch.ssh.filetransfer.FileTransferClient.html, Twisted seems to be a bit more modern with the async approach and the bug in paramiko seems to be around since 2017.
@potiuk that would be definitely a good place to start. How do you want to go about this? I would definitely want to contribute to this.
@potiuk that would be definitely a good place to start. How do you want to go about this? I would definitely want to contribute to this.
I do not think there is anything special needed. Just rewrite the Hook/Operator, starting with replacing the twisted library in setup.py deps instead of paramiko/sftp. Then we can relase a major release of SFTP provider with it. That's pretty much it :)
@potiuk will take some time out and start working on the PR. Thanks
@potiuk what about https://github.com/ParallelSSH/parallel-ssh#sftp – ?
See also https://github.com/ParallelSSH/parallel-ssh#why-this-library:
Because other options are either immature, unstable, lacking in performance or all of the aforementioned.
Certain other self-proclaimed leading Python SSH libraries leave a lot to be desired from a performance and stability point of view, as well as suffering from a lack of maintenance with hundreds of open issues, unresolved pull requests and inherent design flaws.
I'd probably caution against introducing twisted -- with Python 3.6/3.7+ the built in asyncio can do most of twisted without the need for a large external dependency.
Agree with @ashb . Parallel-ssh looks better as well. Seems rather popular from the number of forks, has very little number of dependencies (gevent, python-ssh2). Not super active (but what activity you'd expect from such library?).
Note that parallel-ssh is not small either (mainly because it ships its own copy of libssh
and libssh2
).
$ pip wheel -w tw twisted
...
$ pip wheel -w ps parallel-ssh
...
$ du -s tw ps
4336 tw
18132 ps
Parallel-ssh is over four times larger than Twisted if dependencies are considered.
Hmm :( tough choice :)
@uranusjr you can build it from source, against system-provided libraries.
But there's something wrong in the packaging because when I look at the installed files, there are lots of .c source files:
I think this solution Link can be a fix without really changing any libraries... Have been using it for sometime and haven't gotten into any issues till now! What do you guys think?
Have been using it for sometime and haven't gotten into any issues till now! What do you guys think?
If you have a proposed solution - it's better to open a PR and ask for code review
Seems that the issue might be closed in the upcoming sftp provider.
I'm still encountering this issue with the new provider (which uses paramiko).
I see some workarounds in this thread but the issue is still open: https://github.com/paramiko/paramiko/issues/926
Apache Airflow version: 2.0.1 (Should apply to previous versions and later ones as well)
Environment:
What happened: In
airflow.providers.sftp.hooks.sftp.SFTPHook
, when we try to download a file greater than 18 MiB, the download keeps happening forever and never gets completed.What you expected to happen: The download should have completed in seconds but did not. A file less than 18MiB gets downloaded in few seconds. Looks like this is an underlying issue in the
paramiko
library. Attaching a bunch of issues on paramiko's git and stackoverflow -How to reproduce it:
Anything else we need to know: I after exploring found a solution to the problem and have fixed it in my project but if the community can dive deep it would be great. Link to the solution is - https://gist.github.com/vznncv/cb454c21d901438cc228916fbe6f070f This gist is by @vznncv and credits to him for coming up with a solution.