fsspec / s3fs

S3 Filesystem
http://s3fs.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
892 stars 275 forks source link

ls returns full path, not just items #906

Open msharp9 opened 1 month ago

msharp9 commented 1 month ago

s3fs.S3FileSystem().ls(bucket_path) currently returns: [f'{bucket_path}/file1.txt', f'{bucket_path}/file2.txt', ...].

But running ls, in linux or python's similar function, os.listdir(path), returns just the file names, e.g. ['file1.txt', 'file2.txt', ...].

This is closer to what I'd expect from the docstring as well: https://github.com/fsspec/s3fs/blob/main/s3fs/core.py#L261-L265

>>> s3fs.__version__
'2024.9.0'

I just find this behavior confusing. I'm assuming this would be a relatively large breaking change for most users, but maybe we could add a flag to just return the file names?

martindurant commented 3 weeks ago

All of the ls methods in all of the backends return full paths within the context of the filesystem (i.e., usually without a protocol prefix). If the docsstring is wrong, it should be fixed.

There have been discussions on making the return value for file information more concretely defined. It would be reasonable to have the "base name", "full path" and "canonical name" (with prefix and other URL parameters) as fields. That would still leave the problem to which to show when not requesting to see the details. I think the current pattern, returning strings that you can then directly use with the filesystem, and also the same as you get from find/glob - I think this is a good place to be. It is, however, a matter of judgement.

msharp9 commented 3 weeks ago

I'm happy with whatever, I just searched the issues and didn't see any conversations around this, closed or open.

I do think your suggestion around returning fields makes a lot of sense when requested.