fsspec / sshfs

sshfs - SSH/SFTP implementation for fsspec
Apache License 2.0
58 stars 14 forks source link

Register SSHFileSystem with fsspec. #34

Closed NotSpecial closed 1 year ago

NotSpecial commented 1 year ago

Related to #25.

Currently, it is not possible to register the SSHFileSystem with fsspec.

However, it seems to me that the only thing that is missing is appropriate parsing of the protocol string. I compared the current SSHFileSystem with the AbstractFileSystem base class and discovered that only two methods are missing: _strip_protocol, which removes all protocol info from a url, and _get_kwargs_from_urls, which translates a URL to the appropriate arguments to instantiate the class.

I leveraged the fact that sshfs is already able to parse ssh URLs, and copied the required functions from the existing STFP filesystem.

Finally, I updated setup.cfg such that the SSHFileSystem is automatically registered for the ssh protocol when installing this library.

Testing locally with my setup, this works like a charm. However, I would like to contribute towards testing this more thoroughly. I would appreciate any pointers on how to achieve this most efficiently. Thanks!

NotSpecial commented 1 year ago

In the current tests, it seems that testing rm fails. I do not understand if and how this might be related to the proposed changes here. Any advice?

efiop commented 1 year ago

Added daily and workflow_dispatch to our tests jobs, let's see if that succeeds.

efiop commented 1 year ago

Looks like your change is triggering the error https://github.com/fsspec/sshfs/actions/runs/5535143314

NotSpecial commented 1 year ago

Good to know! I'll investigate and see if I can figure out where things are going wrong.

NotSpecial commented 1 year ago

I have found the issue. The default _strip_protocol from the FileSystem base class would remove trailing slashes, while my updated version would not.

The failing line in the test was:

fs.touch(remote_dir + "/dir/c/a/")

You cannot touch a directory. Before, this path was changed to remote_dir/dir/c/a, and touch would succeed. With my initial change, it would remain remote_dir/dir/c/a/ and touch would fail.

I've updated the code to remove trailing slashes, and the test succeeds again (at least locally).

NotSpecial commented 1 year ago

I have also added an additional test for the registration with fsspec. It tests writing to a file using a ssh:// url.

NotSpecial commented 1 year ago

Finally, I have updated the README to describe the new functionality. Happy for any feedback!

NotSpecial commented 1 year ago

Nice to see that the tests passed, but pre-commit failed because of wrongly sorted imports. I ran isort and it should be fixed now.

NotSpecial commented 1 year ago

Hi @efiop, as soon as you have some time, could you trigger the tests again, please?

I probably don't have the permissions, as I can't see a way to trigger them myself.

efiop commented 1 year ago

Oops, sorry for that. Let me see if I can disable that safeguard for you...

efiop commented 1 year ago

@NotSpecial I think the actions permissions are fixed now, you won't be constrained by it anymore.

codecov-commenter commented 1 year ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (main@a62fd30). Click here to learn what that means. Patch has no changes to coverable lines.

:exclamation: Current head 41671dd differs from pull request most recent head 138e2d1. Consider uploading reports for the commit 138e2d1 to get more accurate results

:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #34 +/- ## ======================================= Coverage ? 93.99% ======================================= Files ? 13 Lines ? 699 Branches ? 0 ======================================= Hits ? 657 Misses ? 42 Partials ? 0 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

NotSpecial commented 1 year ago

Happy to help!

efiop commented 1 year ago

For the record 2023.7.0 is out with this change.

uunal commented 1 year ago

Hi, when I tried registration via the test example: test_fsspec_registration assertion fails. Still sftp/ssh implementation on fsspec points to paramiko.

Shouldn't the fsspec.filesystem('sftp',connection_args) points to 'sshfs.spec.SSHFilesystem' like in this PR ?

NotSpecial commented 1 year ago

Are you using the latest version, i.e. does pip freeze | grep sshfs return 2023.7.0 (or later, in case you read this after a future release)?

Alternatively, if you have cloned the repository locally, have you installed it using pip install .?

The registration happens during installation, maybe something went wrong there?

uunal commented 1 year ago

For first option yes returns correct version. Did not try cloning repo though.

Tried installation couple of times, just not register.

NotSpecial commented 1 year ago

What does the following code return for you?

from fsspec.registry import known_implementations

known_implementations['ssh']['class']

I'm getting sshfs.spec.SSHFileSystem, what do you see?

NotSpecial commented 1 year ago

Re-reading your messages, are you tring to use an ssh URL or an sftp URL?

Currently sshfs is only registered for ssh://, not for sftp:// (which still uses paramiko).

uunal commented 1 year ago

Re-reading your messages, are you tring to use an ssh URL or an sftp URL?

Currently sshfs is only registered for ssh://, not for sftp:// (which still uses paramiko).

Ahh ok then. Yes I'm trying sftp. I've misunderstood PR.

Thank you.

NotSpecial commented 1 year ago

@efiop I guess we could also register sshfs for sftp:// urls, couldn't we? If I understand correctly, nothing would change internally. I'm happy to add this (and a quick test that it works); but please let me know if you have any concerns.

uunal commented 1 year ago

Hi @NotSpecial , are you going to create PR for sftp spec registration? If you are occupied, I can do it too. Thanks

NotSpecial commented 1 year ago

I didn't get the change to look at this. Thanks for taking care of it!