fsspec / filesystem_spec

A specification that python filesystems should adhere to.
BSD 3-Clause "New" or "Revised" License
1.05k stars 362 forks source link

Return of info (FileInfo) is unspecified, need consistent way to detect get link information #1680

Open mxmlnkn opened 2 months ago

mxmlnkn commented 2 months ago

The API specification for listdir and by inference also info reads:

The specific keys, or perhaps a FileInfo class, or similar, is TBD, but must be consistent across implementations. Must include:

I don't really understand the comment. "is TBD, but must be consistent across implementations" is an oxymoron. How can it be consistent between implementations when it is not specified yet? In order to increase the usefulness of the filesystem_spec API, this should be specified, imo.

Consider for example this:

import pprint
o = fsspec.open("ssh://127.0.0.1")
ol = fsspec.open("/")
print("Info of ssh:///bin")
pprint.pprint([x for x in o.fs.listdir('/', detail=True) if x['name'] == '/bin'][0])
print("Info of /bin")
pprint.pprint([x for x in ol.fs.listdir('/', detail=True) if x['name'] == '/bin'][0])

Output:

Info of ssh:///bin
{'gid': 0,
 'mtime': datetime.datetime(2021, 6, 21, 9, 52, 39, tzinfo=datetime.timezone.utc),
 'name': '/bin',
 'size': 7,
 'time': datetime.datetime(2024, 9, 19, 7, 14, 44, tzinfo=datetime.timezone.utc),
 'type': 'link',
 'uid': 0}

Info of /bin
{'created': 1601554848.8334851,
 'destination': 'usr/bin',
 'gid': 0,
 'ino': 13,
 'islink': True,
 'mode': 41471,
 'mtime': 1601554848.8334851,
 'name': '/bin',
 'nlink': 1,
 'size': 147456,
 'type': 'other',
 'uid': 0}

This is not consistent between these only two tested implementations:

~The other implementations should also be tested.~ There is a nice comprehensive overview here. Code for all, but concrete examples are missing for some.

martindurant commented 2 months ago

Perhaps it is poorly written, but this means:

mxmlnkn commented 1 month ago

The HTTP file system is also inconsistent in regards to requiring the full URL specification for each listdir, open, etc. call. This is in stark contrast to the other implementations. See https://github.com/ray-project/ray/issues/26423

Furthermore, some implementations return the name with leading / (fsspec.implementations.ftp.FTPFileSystem, sshfs.SSHF), some without (fsspec.implementations.git.GitFileSystem), which was another source of bugs for my wrapper. I am surprised that I have not encountered a filesystem yet that returns simply the file name just as the name key implies instead of the absolute path, but I still have 4+ other fsspec implementations that I still need to test and "implement" ...

martindurant commented 1 month ago

(ref #1713)

some implementations return the name with leading /

Implementations have a root_marker class attribute that is typically "" or "/" to distinguish this behaviour.