Closed wynnw closed 2 years ago
As a side note, I haven't been able to reproduce these errors with the asyncssh 2.9.0 project that uses asyncio instead of gevent.
I have seen a similar issue when upload a tar
file to many devices.
The files were for some of them consistently corrupted, all bad files resulted in the same hash value.
Hi there,
Thanks for the interest and report. Since there have been multiple reports on this issue, I do believe there is one. Having gone through the code again, can see where there is potential for a race condition which sounds like the behaviour here. The difference in the written vs total bytes is exactly 2k as well, which is the internal buffer size of libssh2. That can't be an accident.
I am curious if you see the same behaviour with copy_file
, the SFTP copy. I'd expect not. And you cannot have used SCP with asyncssh since it does not support it, it uses SFTP.
Another question, are you able to reproduce this against a local SSH server or only remote? I still need to be able to reproduce in a test to confirm a fix works, that would help in doing so. The existing tests do not use such a large file size, will try again with 100MB.
I can't use sftp with my systems, not sure why and I haven't deep dived on that. So I am using scp (was before with paramiko), and now I'm using scp with asyncssh as it does support it. https://asyncssh.readthedocs.io/en/latest/index.html#scp-client
I've been focused on this inside aws. It reproduces very consistently for me within the same ec2 region and when going across ec2 regions. So that's more overhead than a local ssh server of course. I haven't tried on my own home LAN.
Thanks for the interest and report. Since there have been multiple reports on this issue, I do believe there is one. Having gone through the code again, can see where there is potential for a race condition which sounds like the behaviour here. The difference in the written vs total bytes is exactly 2k as well, which is the internal buffer size of libssh2. That can't be an accident.
Yes, that is approximately the amount of bytes missing in my case. I uploaded ~2MB tars and they were damaged only at the end. I verified that with a hexdump of the files. And the missing bytes werent that much, 2K seems about right.
The remote-side in my production-system use dropbear
over a mobile connection. Maybe such an environment is more prone to the condition as well?
We use toxiproxy
for some tests, maybe their rate-limiting helps with reproduction.
I am curious if you see the same behaviour with
copy_file
, the SFTP copy. I'd expect not. And you cannot have used SCP with asyncssh since it does not support it, it uses SFTP.
AFAIK our dropbear
does not support sftp
properly/there were issues. Thats why we used scp
.
Another question, are you able to reproduce this against a local SSH server or only remote? I still need to be able to reproduce in a test to confirm a fix works, that would help in doing so. The existing tests do not use such a large file size, will try again with 100MB.
https://github.com/shopify/toxiproxy
Maybe you can play around with timings/bandwith using toxiproxy. Local setups might be "too good" under normal circumstances.
Resolved in Really resolved in 2.9.0
.2.9.1
.
Triggered by client going out of scope, which caused a race condition if client is not kept alive.
scp_send
should also be a lot faster as well now, 10x or more.
Please reply on this issue if still having problems, will look into it asap.
Nice fix - thanks!
Describe the bug Same thing as the closed bug #226 - scp operations are not transferring the whole file at times. This bug was closed as can't reproduce. I realize this is really hard to deal with as this appears to be a timing issue with gevent greenlets and non-blocking I/O. I can't really debug it as just using a debugger introduces more time and then the problem never manifests.
To Reproduce
Steps to reproduce the behavior:
Expected behavior Destination files match source files.
Actual behaviour At least one target server's destination file is missing a certain number of files at the end. For a file of 102912000 bytes I typically have one or more servers that only have 102891520 bytes, and that smaller number is very consistent.
Screenshots If applicable, add screenshots to help explain your problem.
Additional information I've run into this a bunch today as I've begun using this library. It's hard to repro every time, but I've got a path where sending 100mb files to two hosts is missing the last few bytes on just one about 50% of the time. Still investigating, but this is a real issue...
If I add a sleep(2) at the end of the data loop in _scp_send I never have the problem, so that's interesting.
This is against openssh-server 7.4p1-21 running on amazonlinux2, using: parallel-ssh==2.8.0 ssh-python==0.9.0 ssh2-python==0.27.0
The virtualenv is using python 3.10.1
in a virtualenv. The code I'm running is just:
Using a file that is 99mb and sending to two servers. This is code integrated with some fabric2 code.
In further testing, I'm able to reproduce the problems with a 10561 bytes text file and 4 servers, and quite often or or two of the servers gets a 0 byte file. If I add the gevent sleep call in at the end of the _scp_send data/eagain_write loop like sleep(2) I never see the problem. Seems like a channel can get closed before all the data is flushed out?
This is an awesome library, you've done great work. I can't really use it though until I get this problem figured out as it happens at least 50% of the time.