Open barneygale opened 7 months ago
Proposal: add a status()
method that returns a Status
object, which would resemble an os.DirEntry
and include methods like is_dir()
and file_type()
.
PathBase.status()
would be abstract, and PathBase.stat()
would call self.status().as_stat_result()
.
Path.status()
would call Status.from_stat_result(self.stat())
, and Path.stat()
would call os.stat()
as it does now.
Users wouldn't need to deal with the low-level stat_result
interface.
This sounds very interesting for universal_pathlib
. Cross-referencing fsspec/universal_pathlib#145
In filesystem_spec
this "non os.stat_result
returning" method is provided by AbstractFileSystem.info()
and its return value is a dictionary with a few standardised keys and unstructured optional information that a filesystem implementation can return.
Apache Airflow provides a wrapper class based on dict that takes this info()
dict and provides some of os.stat_result
's interface: https://github.com/apache/airflow/blob/8b5a1ff5e9b3ef6941ab932de64dd1c939fcc2fb/airflow/io/utils/stat.py#L22-L83
In internal code I have mostly seen .stat()
being used to infer the size or the creation / modification time of files.
Very interesting, thank you!
Out of interest, why not a dataclass rather than a dict with standardised keys? I was leaning towards a dataclass personally. Subclasses (e.g. S3Status
) could add their own fields.
And do you think there's any sense in adding a PathBase.set_status/info()
method? Other methods like PathBase.chmod()
could be made to call it, supplying a Status(permissions=...)
object. Users could call it with an S3Status
object to set S3-specific metadata.
Out of interest, why not a dataclass rather than a dict with standardised keys? I was leaning towards a dataclass personally. Subclasses (e.g.
S3Status
) could add their own fields.
I'm not sure, but my guess would be that this was a pragmatic decision. fsspec.spec.AbstractFileSystem.info()
returns a dict since fsspec
's first release.
There is an open issue in the repo to standardise this across filesystems.
And do you think there's any sense in adding a PathBase.set_status/info() method?
Hmmm, interesting. I'll try to collect some examples for specific filesystems, and will report back.
Is it unreasonable or incorrect to expect users to instantiate an
os.stat_result
object? It's a little tricky.