awslabs / mountpoint-s3

A simple, high-throughput file client for mounting an Amazon S3 bucket as a local file system.
Apache License 2.0
4.28k stars 148 forks source link

Cache prewarming / scripted prefetch #631

Open yawn opened 8 months ago

yawn commented 8 months ago

Tell us more about this new feature.

Certain classes of workloads (such as applications) combine well known initial / general access patterns for files with a relative sensitivity to latency. Such workloads would profit a lot from additional enhancements to the new cache mode.

Very coarse grained this could look like this (note that I'm really only considering the application use case here, so read-only or at maybe read-only with local copy-on-write semantics using an additional overlay filesystem).

mountpoint would require a new cache-profile-creation mode which records a timeline of the read access patterns (either flushed in regular intervals to the filesystem or available as a dump e.g. via signal). This timeline should contain files and their individual blocks retrieved in order. It should also contain the sum (in MiB) of the unique blocks retrieved from the start of mountpoint.

Ideally these profiles would be in a format that would be conductive to a workflow that would allow for

  1. Creating multiple profiles e.g. to account for random and / or small configuration variations in access patterns at start
  2. Merge these profiles into a generic profile containing the most commonly requested blocks in the most common (=frequent) order
  3. Cutoff profiles / generic profiles at a certain size in MiB (to fit into the desired cache size)

This could be done by either adding the necessary tooling into mountpoint itself or by being clever with e.g. csvkit (maybe - I did not think too deep about the format yet).

To use these profiles, mountpoint would require a new matching cache-profile-replay mode that would then try to aggressively backfill the cache directly after start, in the correct order. This would potentially require a cache modification to prevent redundant fetches that have already hit a cache or are in progress (I have not checked the existing implementation if that's already the case).

Making the cache sharable between multiple mountpoint processes might make this feature even more useful.

Update (27.11.): added a few minor clarifications.

ahmarsuhail commented 7 months ago

thanks for the request @yawn. Can you please share more information about these workloads? For example, does your application perform repeated reads? We would like to understand more about the benefits of this feature.

yawn commented 7 months ago

The use case is essentially delivering large applications / executables to geo-distributed servers using equivalently geo-distributed S3 buckets (to avoid dealing with bandwidth delay product issues). Applications tend to have somewhat deterministic access patterns with large sequential (step by step) clusters of individual block reads (since local storage latency is negligible). Applications can also usually be delivered with (local EBS) copy-on-write semantics on top of read-only S3 buckets.

If mountpoint would be able to prefetch (in approximate order) blocks that would be likely to get requested in the future latency at startup (if that startup phase fits into a cache! future iterations could also enable a floating cache window and expire entries it knows will be irrelevant ...) would not be a problem anymore. In many scenarios available network and S3 bandwidth will actually exceed the maximum throughput of e.g. gp3 EBS.

This behaviour will likely apply to other workloads as well but I have data mostly / only for the large applications case.

Update: I missed the request about repeated reads. Yeah, some requests are repeated. But the key driver behind this feature request is exchanging the (slow) sequential reads with (fast) parallel pre-fetches of the blocks users already know a workload will request. In that sense it's not so much about caching per se but about leveraging the cache for prefetching.