althonos / fs.sshfs

Pyfilesystem2 over SSH using paramiko
GNU Lesser General Public License v2.1
88 stars 29 forks source link

SFTP file transfers extremely slow #11

Closed geoffjukes closed 6 years ago

geoffjukes commented 6 years ago

Hi,

I am extremely grateful for this extension to PyFilesystem, thank you.

I have found that SFTP uploads are extremely slow, by orders of magnitude when compared to the same files transferred with FileZilla.

Some initial research seems to indicate that setting the set_pipelined() mode before a transfer will help, a lot. But I am having some trouble figuring out where I should attemp to set this.

I appreciate any help or thoughts you may have,

Regards,

Geoff

althonos commented 6 years ago

Hi @geoffjukes, thanks for reporting this. The set_pipelined method is a method of SFTPFile, so it is possible to monkey-patch it for hopefully better results.

Would you mind testing the fix to see if it actually enhances transfer speeds ? Here's how to turn pipelining on a file opened in a SSHFS:

>>> my_sshfs = fs.open("ssh://...")
>>> my_file = my_sshfs.openbin("/path/to/file")
>>> my_file._f.set_pipelined()
>>> # ... do things with your file ... #

If this works well for you, then I will make the required modification to allow enabling pipeling with a keyword argument (this is what openbin's options are for).

geoffjukes commented 6 years ago

Hi Althonos,

Thank you for responding. I’m opening the target with ‘Sftp://‘ and then using copy_file to transfer a local file to the remote. So maybe I’m barking up the wrong tree. You test doesn’t work in my use-case,

Traceback (most recent call last):
  File "test_scp.py", line 15, in <module>
    tgt._f.set_pipelined()
AttributeError: 'SSHFS' object has no attribute '_f'
geoffjukes commented 6 years ago

I’ll keep investigating this, but initial research isn’t promising - people reporting slow performance with Paramiko going back years.

To put it in context, sending a 1.2GB file over the LAN using FTP, takes 22 seconds. Using Filezilla SFTP, it takes 33 seconds. Using Paramiko SFTP, takes 330 seconds.

althonos commented 6 years ago

@geoffjukes : I pushed a new release so that files are opened as pipelined by default. Not sure if that fixes it, but that should give you slightly faster speeds.

By the way: Python is very inefficient when it comes to I/O. This may just be a case of paramiko not being the most appropriate tool do to what you want to do.

geoffjukes commented 6 years ago

Thanks for looking into that @althonos, and you’re right -Paramiko ended up not being the appropriate solution for my (large file) needs.

I ended up switching to using CrushFTP as a middle-layer between my app and the SFTP remote. I now send over plain FTP, and Crush pushes it upstream the the SFTP Remote.

Thanks again for your efforts though, I’m sure I’ll be using this another time when speed is less important.

Geoff

althonos commented 6 years ago

@geoffjukes : Have you tried using FTPFS as a test ? I'd expect it to be as slow as my SSHFS, but were it really faster I would be keen on keeping the investigation open for this.

geoffjukes commented 6 years ago

@althonos I use FTPFS to talk to the CrushFTP gateway, and it's fast. That was the switch I made (I didn't want to drop PyFilesystem2, so I just switched protocols). I'm not sure if there is anything you can do. I looked around, and everyone, everywhere, reports Paramiko as being really slow for transferring large volumes of data. But I am very grateful that you looked into it.

geoffjukes commented 6 years ago

@althonos I have recently discovered that one or more of the machines doing the IO for me, appears to have issues with either network or (more probably) Disk. This may have skewed my impressions of SSHFS.

Once I have isolated and resolved the IO issue, I will retest SSHFS.

Thanks again for all your efforts

geoffjukes commented 6 years ago

@althonos I finally got a chance to test this.

Re-running the same test as before (1.7GB file this time):

FTPFS takes 34 Seconds SFTPFS takes 118 seconds

So there is a significant improvement in the performance with your change - nearly 1/3 the time (330s down to 188). Not as fast as FTPFS, but with the added encryption overhead, that is to be expected.

Great work!

althonos commented 6 years ago

There's still other things I read that could improve performance. I'll @ you if there's any improvement on this :smile:

holtgrewe commented 5 years ago

(Please tell me if waking old threads is unappropriate).

When using Paramiko SFTP's get(), I'm getting 50MB/s, when using fs.sshfs.SSHFS.download() I'm only getting 5MB/s. What could be the reason?

althonos commented 5 years ago

@holtgrewe : Paramiko SFTP has the additional assumption that you are copying to a local path compared to SSHFS.download, so it probably has more optimisations (like using sendfile). Could you open a new issue? I see a possible optimisation for files with a descriptor here, but I'd need to experiment a bit.

chantalgoret commented 4 years ago

Just found a solutions worked for me, the problem was solve changing the timeout which was 0 sec to 999 sec now it fast :-)