fsspec / universal_pathlib

pathlib api extended to use fsspec backends
MIT License
229 stars 39 forks source link

rename (move) file over different file system #234

Open petrus-v opened 1 month ago

petrus-v commented 1 month ago

I would like to be able to be able to rename/move a file over different file system.

Does it make sense, in this library to support such feature ?

This would let developers write something like this:

UPath("local:///tmp/local_file").rename(UPath("s3://my-bucket/destination"))

which would:

petrus-v commented 1 month ago

Self comments from yesterday tests:

ap-- commented 1 month ago

Hi @petrus-v

Thank you for opening the issue. This is very much in scope for universal_pathlib. There is an open PR #225 that I still need to verify and merge, which provides better errors for the currently unsupported rename across filesystems.

Please also see #227 for future additions once we rely on Python3.14+ upstream implementations in pathlib.

That being said, cross-filesystem functionality could be added to UPath.rename. An implementation for your feature request would preferably be opt-in (.i.e. something like the code example below) to avoid users shooting themselves in the foot.

# example how the feature could look like
>>> src = UPath("/tmp/somefile")
>>> src.write_text('hello world')
>>> dst = UPath("s3://mybucket/path/somefile")
>>> src.rename(dst)
Traceback (most recent call last):
   ...
ValueError("cross-filesystem rename is not permitted by default. Use `allow_protocols=['s3']`")
>>> src.rename(dst, allow_protocols=['s3'])
>>> dst.read_text()
'hello world'

argument names are of course up for debate.

(not PosixUPath nor WindowsUPath) because of inheritance loop between classes make it hard to refactor and choose an mro order suit to all caseses

These might go away in the future once there is a reasonably well working implementation for relative UPaths.

petrus-v commented 1 month ago

Hi !

Thanks taking time to reply, very appreciated.

Hi @petrus-v

Thank you for opening the issue. This is very much in scope for universal_pathlib. There is an open PR #225 that I still need to verify and merge, which provides better errors for the currently unsupported rename across filesystems.

Please also see #227 for future additions once we rely on Python3.14+ upstream implementations in pathlib.

That's nice upstream improvements :heart_eyes:

That being said, cross-filesystem functionality could be added to UPath.rename. An implementation for your feature request would preferably be opt-in (.i.e. something like) to avoid users shooting themselves in the foot.

# example how the feature could look like
>>> src = UPath("/tmp/somefile")
>>> src.write_text('hello world')
>>> dst = UPath("s3://mybucket/path/somefile")
>>> src.rename(dst)
Traceback (most recent call last):
   ...
ValueError("cross-filesystem rename is not permitted by default. Use `allow_protocols=['s3']`")
>>> src.rename(dst, allow_protocols=['s3'])
>>> dst.read_text()
'hello world'

argument names are of course up for debate.

I'm fine to implement the opt-in and will probably allow to set it globally from environment variables if you agree, something like this:

def rename(dst, allow_protocols=None):
    if allow_protocols is None:
        allow_protocols = os.environ.get("UPATH_RENAME_ALLOW_PROTOCOLS", "").split(",")
    ...

I like the idea that source code know about where are store data and separate configuration and usages.

(not PosixUPath nor WindowsUPath) because of inheritance loop between classes make it hard to refactor and choose an mro order suit to all caseses

These might go away in the future once there is a reasonably well working implementation for relative UPaths.

sounds great :+1: