Open isidentical opened 3 years ago
Comparison by @mxmlnkn:
import time
import fsspec.implementations.sftp
fs = fsspec.implementations.sftp.SFTPFileSystem("127.0.0.1")
for i in range(5):
t0=time.time(); size=len(fs.open('silesia.tar.gz').read()); t1=time.time()
print(f"Read {size} in {t1-t0:.2f} s -> {size/(t1-t0)/1e6:.2f} MB/s")
# Read 68238807 in 16.93 s -> 4.03 MB/s
# Read 68238807 in 16.74 s -> 4.08 MB/s
# Read 68238807 in 16.74 s -> 4.08 MB/s
# Read 68238807 in 16.75 s -> 4.07 MB/s
# Read 68238807 in 16.70 s -> 4.09 MB/s
import sshfs
fs = sshfs.SSHFileSystem("127.0.0.1")
for i in range(5):
t0=time.time(); size=len(fs.open('silesia.tar.gz').read()); t1=time.time()
print(f"Read {size} in {t1-t0:.2f} s -> {size/(t1-t0)/1e6:.2f} MB/s")
# Read 68238807 in 2.06 s -> 33.18 MB/s
# Read 68238807 in 2.07 s -> 32.99 MB/s
# Read 68238807 in 2.04 s -> 33.43 MB/s
# Read 68238807 in 2.01 s -> 33.93 MB/s
# Read 68238807 in 2.04 s -> 33.48 MB/s
I have expanded on the initial benchmarks to also include some "random" reading in chunks of different size, including the edge case of sequential reading, i.e., a single chunk / read of size "-1". The measurements are surprisingly stable, even when comparing with the results from 2 days ago reposted above.
The degrading performance for larger chunk sizes is probably because the read call for the chunk at the end of the file reads a bit over the file end (66 MB file), which triggers https://github.com/ronf/asyncssh/issues/691 .
I also did an extended comparison with other programs according to the issue title.
The same plot could also be done for write / upload speeds. And I forgot to include the original sshfs FUSE tool...
Nice! Is it CPU bounded in this case (bc of Python overhead for example)? or some configuration (just curious why it can be ~10x slower vs rclone/scp
I tried to analyze the performance a bit in this post: https://github.com/ronf/asyncssh/issues/691#issuecomment-2375382829
I have added sshfs to the benchmarks:
Edit 2024-09-30: Turns out that asyncssh is the only one out of these 9 alternatives that enables compression by default if nothing is specified. After explicitly disabling compression, the performance is finally comparable to other tools. Furthermore, ayncssh was extended to query the best block size from the server and use that as default. This further improves the performance.
I have also done write benchmarks, although I dropped lftpget
and sftp
because I could not be bothered to rewrite the command lines to work for uploads:
rclone is surprisingly bad at uploads, but aside from that there are the same hilarious one to two orders of magnitude performance differences between the worst and best tools.
I'd say these benchmarks should be enough for this issue.
The ReadMe mentions this implementation to be faster than paramiko. It would be nice to see proof of that.
This is may be related.