fsspec / gcsfs

Pythonic file-system interface for Google Cloud Storage
http://gcsfs.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
320 stars 141 forks source link

Pin generation on open for version aware file system #601

Open emfdavid opened 6 months ago

emfdavid commented 6 months ago

Here is an example of the current behavior of version_aware file systems and open files in GCSFS.

I would have expected a version_aware file system to pin the generation of an object while the file is open so that the reads are consistent like they are when the open url specifies the generation.

If this is the desired behavior, I would be happy to take a look and see if I can fix it. Based on your comment in the previous issue I think the python gcs sdk download_as_bytes api should support what we need implementing etag matching similar to what exists in S3 but I haven't compared the details yet.

martindurant commented 6 months ago

I would have expected a version_aware file system to pin the generation of an object while the file is open so that the reads are consistent like they are when the open url specifies the generation.

This is a reasonable expectation

martindurant commented 3 months ago

FWIW, I think s3fs does this right even without version-aware by saving the etag.

emfdavid commented 3 months ago

Is this worth asking Google about?

martindurant commented 3 months ago

No, I'm pretty sure we can do this by ourselves with the available information