fsspec / filesystem_spec

A specification that python filesystems should adhere to.
BSD 3-Clause "New" or "Revised" License
943 stars 342 forks source link

Make SFTP File-System more configurable #392

Open tharradine opened 3 years ago

tharradine commented 3 years ago

Currently, when we instantiate an SFTP file-system, we can pass kwargs directly to paramiko.client.SSHClient.connect(). However, I would like to be able to do more than just this - specifically, set parameters regarding host keys. The missing_host_key_policy is hard-coded to AutoAddPolicy, whereas I might prefer to use RejectPolicy. But even if I were able to change the policy, I wouldn't have a chance to add any host-keys before the connection is made.

I propose allowing the user to optionally pass in a paramiko.client.SSHClient instance to SFTPFileSystem.__init__() which is pre-configured to their desires. This should allow more freedom to users of the SFTPFileSystem, without a inheriting a significant maintenance burden.

The other option would be to allow passing in a paramiko.sftp_client.SFTPClient instance. The issue I find with this, however, is that the SFTPFileSystem.client attribute would no longer be set (or would be None), in cases where the SFTPClient is passed in. I don't know an easy way to reverse-engineer the SSHClient instance from an SFTPClient instance. Perhaps it's not important, I'll leave that up to discussion.

Thanks to the maintaners for all your great work on this library!

martindurant commented 3 years ago

There are some filesystem implementations that allow passing of client objects instead of (as well as) configuration kwargs, and it would be reasonable here. The trouble is, that the resultant instance would not be serialisable, which is something that we generally prefer, because the intent is to be able to work well with distributed systems like Dask. That would mean, that for the case of passing a configured/open cilent, you would at least have to produce a reasonable error message in __reduce__ when trying to serialise. So it may be easier to encode all of the possible options you might want to see into init kwargs; there are exactly three defined missing key policies, for example.

I don't know an easy way to reverse-engineer the SSHClient instance

I don't know how to do this either, although presumably it's possible. The client is not actually used by the filesystem instance once the sftp session is made, so there may be no reason to keep it.

tharradine commented 3 years ago

I see! To be honest, I had no idea about that side of fsspec.

I think if I were using fsspec in a distributed system, I would still want the ability to specify host keys and possibly other config options. So for that reason I think I'd prefer to make use of ssh_kwargs here to set those extra options. I'm happy to make a PR if you think this is a reasonable approach.

Cheers!


From: Martin Durant notifications@github.com Sent: Tuesday, September 1, 2020 11:31:02 PM To: intake/filesystem_spec filesystem_spec@noreply.github.com Cc: Toby Harradine toby.harradine@elulagroup.com; Author author@noreply.github.com Subject: Re: [intake/filesystem_spec] Make SFTP File-System more configurable (#392)

There are some filesystem implementations that allow passing of client objects instead of (as well as) configuration kwargs, and it would be reasonable here. The trouble is, that the resultant instance would not be serialisable, which is something that we generally prefer, because the intent is to be able to work well with distributed systems like Dask. That would mean, that for the case of passing a configured/open cilent, you would at least have to produce a reasonable error message in reduce when trying to serialise. So it may be easier to encode all of the possible options you might want to see into init kwargs; there are exactly three defined missing key policies, for example.

I don't know an easy way to reverse-engineer the SSHClient instance

I don't know how to do this either, although presumably it's possible. The client is not actually used by the filesystem instance once the sftp session is made, so there may be no reason to keep it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/intake/filesystem_spec/issues/392#issuecomment-684854841, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APS2UYN5NU4TMMRMTT4GSELSDTZRNANCNFSM4QRDS3EQ.

martindurant commented 3 years ago

I'm happy to make a PR if you think this is a reasonable approach.

Yes please!