barneygale / pathlib-abc

Python base classes for rich path objects
Other
23 stars 1 forks source link

Base class for `PathBase.stat()` return value? #3

Open barneygale opened 7 months ago

barneygale commented 7 months ago

Is it unreasonable or incorrect to expect users to instantiate an os.stat_result object? It's a little tricky.

barneygale commented 6 months ago

Proposal: add a status() method that returns a Status object, which would resemble an os.DirEntry and include methods like is_dir() and file_type().

PathBase.status() would be abstract, and PathBase.stat() would call self.status().as_stat_result().

Path.status() would call Status.from_stat_result(self.stat()), and Path.stat() would call os.stat() as it does now.

Users wouldn't need to deal with the low-level stat_result interface.

ap-- commented 5 months ago

This sounds very interesting for universal_pathlib. Cross-referencing fsspec/universal_pathlib#145

In filesystem_spec this "non os.stat_result returning" method is provided by AbstractFileSystem.info() and its return value is a dictionary with a few standardised keys and unstructured optional information that a filesystem implementation can return.

Apache Airflow provides a wrapper class based on dict that takes this info() dict and provides some of os.stat_result's interface: https://github.com/apache/airflow/blob/8b5a1ff5e9b3ef6941ab932de64dd1c939fcc2fb/airflow/io/utils/stat.py#L22-L83

In internal code I have mostly seen .stat() being used to infer the size or the creation / modification time of files.

barneygale commented 5 months ago

Very interesting, thank you!

Out of interest, why not a dataclass rather than a dict with standardised keys? I was leaning towards a dataclass personally. Subclasses (e.g. S3Status) could add their own fields.

And do you think there's any sense in adding a PathBase.set_status/info() method? Other methods like PathBase.chmod() could be made to call it, supplying a Status(permissions=...) object. Users could call it with an S3Status object to set S3-specific metadata.

ap-- commented 5 months ago

Out of interest, why not a dataclass rather than a dict with standardised keys? I was leaning towards a dataclass personally. Subclasses (e.g. S3Status) could add their own fields.

I'm not sure, but my guess would be that this was a pragmatic decision. fsspec.spec.AbstractFileSystem.info() returns a dict since fsspec's first release. There is an open issue in the repo to standardise this across filesystems.

And do you think there's any sense in adding a PathBase.set_status/info() method?

Hmmm, interesting. I'll try to collect some examples for specific filesystems, and will report back.