fsspec / universal_pathlib

pathlib api extended to use fsspec backends
MIT License
211 stars 36 forks source link

UPath(path).parent does not work as expected for memory URIs with authority component #161

Closed kadykov closed 5 months ago

kadykov commented 8 months ago

.parent() method of the paths pointing to the files in the root directory returns the filename with added slash, instead of root symbol. This issue blocks PR in intake

Code to reproduce:

from upath import UPath

def get_dir(path):
    return str(UPath(path).parent)

print(f"{get_dir('memory://cat.yml') = }")  # 'memory://cat.yml/'
print(f"{get_dir('memory://dir/cat.yml') = }")  # 'memory://dir/'
print(f"{get_dir('file://cat.yml') = }")  # 'file://cat.yml/'
print(f"{get_dir('file://dir/cat.yml') = }")  # 'file://dir/'
ap-- commented 8 months ago

Hi @kadykov,

Thanks for opening the issue. I don't fully understand how this blocks the PR you mentioned. Could you be more specific?

Regarding the URIs you used in your example: These are all URIs with authority parts. I fully agree that since an authority doesn't make sense for the memory file system these should be ignored, this is actually fixed in #152 and needs to be backported to older python versions. In the file URI case it's a bit more complicated, see: #142 and #151

All being said, try running your example with a minor change:

from upath import UPath

def get_dir(path):
    return str(UPath(path).parent)

# empty authority URIs
print(f"{get_dir('memory:///cat.yml') = }")  # 'memory:/'
print(f"{get_dir('memory:///dir/cat.yml') = }")  # 'memory:/dir'
print(f"{get_dir('file:///cat.yml') = }")  # 'file:/'
print(f"{get_dir('file:///dir/cat.yml') = }")  # 'file:/dir'

# no authority URIs
print(f"{get_dir('memory:/cat.yml') = }")  # 'memory:/'
print(f"{get_dir('memory:/dir/cat.yml') = }")  # 'memory:/dir'
print(f"{get_dir('file:/cat.yml') = }")  # 'file:/'
print(f"{get_dir('file:/dir/cat.yml') = }")  # 'file:/dir'

To support the not-well ~defined~ handled "double-slash" URIs we first need to address (and backport) the changes from issues and PRs mentioned above.

Let me know if this clears things up.

Andreas

kadykov commented 8 months ago

Thank you for the explanation. It seems that in our case it would be easier to avoid using memory:// "double-slash" notation.

I don't fully understand how this blocks the PR you mentioned. Could you be more specific?

Sorry, I posted a wrong link, the actual PR that uses UPath is here.

ap-- commented 8 months ago

162 addresses this issue for memory URIs.

We have #108 to track the issue for file URIs