fsspec / universal_pathlib

pathlib api extended to use fsspec backends
MIT License
240 stars 42 forks source link

`.parts` not returning expected result #180

Closed jrbourbeau closed 7 months ago

jrbourbeau commented 7 months ago

I'm trying to get the bucket name for a specific S3-path. I thought I would be able to do this using path.parts[0] based on what I get from pathlib

In [1]: from pathlib import Path

In [2]: Path("bucket/file.txt").parts
Out[2]: ('bucket', 'file.txt')

but it looks like that's not the case with the latest upath 0.1.4 release:

In [1]: from upath import UPath

In [2]: UPath("s3://bucket/file.txt").parts
Out[2]: ('/', 'file.txt')

as the bucket name is missing from this output

ap-- commented 7 months ago

Hi @jrbourbeau

This is fixed in v0.2.0 UPath which will be released very soon.

>>> import upath
>>> upath.UPath("s3://bucket/file").parts
('bucket/', 'file')
>>> upath.UPath("s3://bucket/file").drive
'bucket'
>>> upath.UPath("s3://bucket/file").root
'/'
>>> upath.UPath("s3://bucket/file").anchor
'bucket/'
jrbourbeau commented 7 months ago

Ah, great. Thanks @ap-- -- I look forward to trying the new release out

ap-- commented 7 months ago

Let me know if it works! 😃

closed via 569ceabb73503c20521949b6a5c7e3de8c6d411f

mbanani commented 2 months ago

Hi @ap-- Thank you for fixing this! Could you please clarify why returning bucket\ instead of just bucket is the behavior?

ap-- commented 2 months ago

Hi @mbanani,

Thank you for your question! universal_pathlib tries to map fsspec URIs (or urlpaths) to the pathlib.Path concept as much as possible. For object storage the netloc of an URI, e.g. something in s3://something/file maps IMO well to a "drive". In stdlib pathlib, when paths are absolute, the parts tuple starts with the anchor, which is the drive concatenated with the root.

>>> pathlib.PurePosixPath("/acb").parts
('/', 'acb')
>>> pathlib.PureWindowsPath("C:/abc").parts
('C:\\', 'abc')
>>> upath.UPath("s3://bucket/abc").parts
('bucket/', 'abc')

That's why the name of a "bucket" is provided via the .drive attribute of a UPath instance.

I hope this clarifies the reasoning. Do you think a note in the README would help clarifying this? And if yes, would you be willing to sketch out a PR for adding that note?

Also, in your message you write bucket\ with a backslash instead of a forward slash. Is that a typo? If not, could you please provide information about your operating system, python version, and the output of pip freeze.

Have a great day, Andreas 😃