fsspec / universal_pathlib

pathlib api extended to use fsspec backends
MIT License
251 stars 44 forks source link

Adds WebdavPath #60

Closed normanrz closed 2 years ago

normanrz commented 2 years ago

Adds a WebdavPath that can be used like this: UPath("webdav+https://example.com").

The PR implements some fixes in the _WebdavAccessor that transforms the output of the webdav4 module to the conventions fsspec uses (i.e. full paths with leading slashes, see listdir and glob).

Not sure if webdav+https is a good scheme. Perhaps URL chaining such as webdav::https would be more fsspec-idomatic.

normanrz commented 2 years ago

Don't you think URL chaining would be in the spirit of fsspec? See https://filesystem-spec.readthedocs.io/en/latest/features.html#url-chaining

jstriebel commented 2 years ago

Don't you think URL chaining would be in the spirit of fsspec? See https://filesystem-spec.readthedocs.io/en/latest/features.html#url-chaining

True, this seems to be a standard with fsspec. However, I've never seen this before and think using :: as a delimiter in the scheme is not necessarily a good idea. Maybe it's fine to simply allow both?

jstriebel commented 2 years ago

PS: According to RFC 3986 about URIs : should not be part of the scheme:

Scheme names consist of a sequence of characters beginning with a letter and followed by any combination of letters, digits, plus ("+"), period ("."), or hyphen ("-")

normanrz commented 2 years ago

I think the semantics of the :: is not that it delimits parts of the scheme, but that it chains nested URLs. So webdav would be a URL and https://... would be the nested URL. The examples from the docs are clearer: With zip://*.csv:: gcs://bucket/afile.zip you can open a csv file stored within a zip on GCS (may need some caching in between). Closer to our webdav scenario is dask::s3://bucket/key, where a dask worker is used to access s3. Anyways, maybe URL chaining would be something to implement at another time.

andrewfulton9 commented 2 years ago

I am not very familiar with webdav or how url chaining in fsspec works, so take my opinion here with a grain of salt, but unless we support url chaining more generally across UPath, it's probably best to avoid that syntax to avoid confusion in my opinion. That said, I think you understand these concepts better than I do so if you think the url chaining syntax here would be less confusing for people, I'll defer to your judgment

normanrz commented 2 years ago

I'll leave the scheme as webdav+http(s) for now. I think that is less confusing. Also, URL chaining would be much more dev effort to implement at this point.

I'll work a bit more on readme and example notebook and merge afterwards.