drivendataorg / cloudpathlib

Python pathlib-style classes for cloud storage services such as Amazon S3, Azure Blob Storage, and Google Cloud Storage.
https://cloudpathlib.drivendata.org
MIT License
425 stars 50 forks source link

Copy to non-existing S3 folder path creates a file with "/" instead of file name in that folder #451

Open quantori-pokidovea opened 3 weeks ago

quantori-pokidovea commented 3 weeks ago

Steps to reproduce:

# take an existing file
src = S3Path("s3://my-bucket/file.txt")
# and copy it into non-existent folder
dst = S3Path("s3://my-bucket/destination/")

src.copy(dst)

Current behavior

Creates a file with name / inside the folder s3://my-bucket/destination/

Expected behavior

The file s3://my-bucket/destination/file.txt is created

pjbull commented 3 weeks ago

FWIW .copy is meant to mimic the behavior of the copy function from shutil, which has similarly potentially unintuitive behavior (it both strips the delimiter and writes dst as a file).

from pathlib import Path
from shutil import copy

src = Path("text_file.txt")
src.write_text("hello")
#> 5

dst = Path("new_target/")

copy(src, dst)
#> PosixPath('new_target')

print(dst.is_file())
#> True
print(dst.read_text())
#> hello

Since this is CloudPath only API that is not on a Path object, we do have some leeway on deciding if there is a better approach. However, there is some ambiguity in the right approach since s3://bucket/file-end-in-slash/ is a valid name for a blob/object on at least some of the providers.

Some options:

If we document, the workaround is to concat the paths yourself (since not all providers even support empty folders):

src = S3Path("s3://my-bucket/file.txt")
dst = S3Path("s3://my-bucket/destination/")

src.copy(dst / src.name)