fsspec / universal_pathlib

pathlib api extended to use fsspec backends
MIT License
240 stars 41 forks source link

Moving files via `UPath.rename` to absolute path locations #206

Closed falexwolf closed 1 month ago

falexwolf commented 6 months ago

Because of this: https://github.com/fsspec/universal_pathlib/blob/380144c18f291f0f0a15fe8a02bc265233dd594b/upath/core.py#L942-L954

the following happens:

from upath import UPath

path = UPath("s3://lamin-us-west-2/wXDsTYYd/.lamindb/S78k5961PekVzaLrQkGq")
path.rename("s3://lamin-us-west-2/wXDsTYYd/.lamindb/S78k5961PekVzaLrQkGq.csv")
# > S3Path('s3://lamin-us-west-2/wXDsTYYd/.lamindb/s3://lamin-us-west-2/wXDsTYYd/.lamindb/S78k5961PekVzaLrQkGq.csv')

To allow moving files across directories, .rename() should behave as it does in pathlib:

image

An easy fix would be to replace

def rename(
    self, target, *, recursive=False, maxdepth=None, **kwargs
):  # fixme: non-standard
    if not isinstance(target, UPath):
        target = self.parent.joinpath(target).resolve()

with

def rename(
    self, target, *, recursive=False, maxdepth=None, **kwargs
):  # fixme: non-standard

    if not isinstance(target, UPath):
        target_upath = UPath(target)
        if target_upath.is_absolute():
            target = target_path
        else:
            target = self.parent.joinpath(target).resolve()

This still clashes with pathlib because pathlib resolves relative to the CWD and not relative to self, but that's another discussion.

Let me know if you accept a PR for this!

ap-- commented 6 months ago

Hi @falexwolf

Thank you for reporting the issue. And thanks for offering to contribute! PRs are very welcome.

As you've pointed out, we won't easily manage to be in line with pathlib here, due to the fact that we don't have a concept of cwd for remote filesystems.

Your suggested fix is good, and I would add a check to make sure that the provided target uses the same protocol (this is seemingly broken in the current version too):

def rename(
    self,
    target,
    *,
    recursive=False,
    maxdepth=None,
    **kwargs,
):  # fixme: non-standard

    target_protocol = get_upath_protocol(target)
    if target_protocol and target_protocol != self.protocol:
        raise ValueError(f"expected protocol {self.protocol!r}, got: {target_protocol!r}")

    if not isinstance(target, UPath):
        target = UPath(target)

    if not target.is_absolute():
        target = self.parent.joinpath(target).resolve()

    ...

If you could add a test for a filesystem that has an empty root_marker as for example "s3" and one for one that has "/" as a root_marker (for example "file") that would be great.

Cheers, Andreas 😃

falexwolf commented 6 months ago

Thanks, Andreas! Will give it a shot!

Koncopd commented 4 months ago

Hi, @ap-- , tried implementing this and observed

`>>> from upath import UPath 
>>> path = UPath("s3://lamin-us-west-2/wXDsTYYd/.lamindb/S78k5961PekVzaLrQkGq") 
>>> path.is_absolute() 
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Users\sergei.rybakov\Apps\Miniconda3\envs\nbproject\lib\pathlib.py", line 1006, in is_absolute return not self._flavour.has_drv or bool(self._drv) AttributeError: 'WrappedFileSystemFlavour' object has no attribute 'has_drv'`

Would you recommend adding custom is_absolute to CloudPath or trying to add has_drv to WrappedFileSystemFlavour?

falexwolf commented 4 months ago

Apologies, @ap--, that it took us so long to return to here. Also, we decided that @Koncopd would make the PR as he has more in-depth experience with the library than I.

ap-- commented 4 months ago

No worries ☺️

I got back from holiday yesterday and am going through my backlog in the evenings now. I'll provide more feedback later tonight or tomorrow.

And thank you so much for contributing!