fsspec / s3fs

S3 Filesystem
http://s3fs.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
892 stars 274 forks source link

method to track session stats #765

Open betolink opened 1 year ago

betolink commented 1 year ago

I couldn't find a method to keep track of the total data cached and transferred on a fsspec session but it would be really helpful to have something like that. Maybe there is already a way to calculate this? given that I do the following

s3 = s3fs.S3FileSystem(anon=True, data_tracking=True)
# I do the usual
s3.open(...)

# and then I can ask:
stats = s3.stats()

and stats will have total data transferred, total HTTP calls, current cache size, max cache size etc.

martindurant commented 1 year ago

It would certainly be possible to do this, but we don't have anything like it yet.

Note that s3fs already has a logger ("s3fs") which you could generate stats from if you made your own custom handler. Most HTTP calls (all?) generate log events.

For caching, did you mean local copies of files, the directory listings or file data caches? For the latter, the filesystem does NOT keep track of the files that are open on it, so that would be tricky. Perhaps you'd need a weakset.