Closed theogaraj closed 4 months ago
Hi @theogaraj
This is definitely an issue, and we should create a UPathStatResult
class (name up for discussion) that provides an interface compatible with dict
(fsspec) and os.stat_result
We need to ensure that the os.stat_result
attribute names map to the equivalent type for each of the fsspec filesystems, too.
relevant fsspec issues:
For now I don't have a good recommendation other than
_stat = pth.stat()
size = _stat["size"] if isinstance(_stat, dict) else _stat.st_size
Cheers, Andreas
I vaguely remembered that I implemented something related to this already...
@ap-- thank you for checking and responding so promptly. Because of another problem I had with iterdir
(logged as https://github.com/fsspec/universal_pathlib/issues/146) I ended up going with the slightly clunkier spath.fs.ls(str(spath))
and then accessing the ['size]
attribute common to both local and S3.
So this is by no means a showstopper. I'll track this issue and can update my code whenever someone is able to resolve these two issues.
Just ran across this. See https://github.com/apache/airflow/blob/main/airflow/io/store/stat.py for a stat
compatible version.
I just ran across the same issue here and would be happy to fire a PR with a fix, though it looks like @ap-- may already be investigating?
@bolkedebruin make several updates to the airflow io provider while inheriting from UPath in https://github.com/apache/airflow/pull/35612 which are great prior art here.
I would propose that we just port the changes to support stat
into UPath
, which is now under:
For links to info()
dicts of a lot of FileSystem implementations check https://github.com/fsspec/filesystem_spec/issues/526#issuecomment-1936188996
All filesystems have "name", "size" and "type".
For translating to os.stat_result.st_*
attributes, these are the keys that could be checked:
Attribute | Possible Info Keys |
---|---|
st_mode | mode, unix.mode, writable, isLink, nlink, permission, isexec |
st_ino | ino, name, id, sha, hex, Digest? |
st_dev | |
st_nlink | nlink, isLink |
st_uid | uid, owner, uname, unix.owner |
st_gid | gid, group, gname, unix.group |
st_size | size |
st_atime | time, last_accessed_on, accessTime |
st_mtime | mtime, last_modified, last_modification_time_ms, timeModified, modify, modificationTime, LastModified, modified_at |
st_ctime | |
st_birthtime | created, creation_time, timeCreated, created_at |
Additional info to be considered comes from specific filesystems:
All data types need to be normalized to int
Which operating system and Python version are you using? Windows 11, Python 3.9.6
Which version of this project are you using? 0.1.3
What did you do? I am attempting to use universal_pathlib in order to have a unified way of handling files whether they are local or in S3. One of the things I need to do is get all the files in a folder (yes I know S3 doesn't have actual folders, but hopefully you understand what I mean) and get their sizes.
What did you expect to see? I expected to be able to use same code to get file sizes whether they are in local directory or in S3
What did you see instead? The return types of stat() are different for S3Path vs WindowsUPath and I can't get file size in the same way from each.
Would this difference in behavior be something you would consider reconciling? Alternatively, do you have suggestions on another approach to achieving what I'm trying to do?