Open yawn opened 8 months ago
thanks for the request @yawn. Can you please share more information about these workloads? For example, does your application perform repeated reads? We would like to understand more about the benefits of this feature.
The use case is essentially delivering large applications / executables to geo-distributed servers using equivalently geo-distributed S3 buckets (to avoid dealing with bandwidth delay product issues). Applications tend to have somewhat deterministic access patterns with large sequential (step by step) clusters of individual block reads (since local storage latency is negligible). Applications can also usually be delivered with (local EBS) copy-on-write semantics on top of read-only S3 buckets.
If mountpoint
would be able to prefetch (in approximate order) blocks that would be likely to get requested in the future latency at startup (if that startup phase fits into a cache! future iterations could also enable a floating cache window and expire entries it knows will be irrelevant ...) would not be a problem anymore. In many scenarios available network and S3 bandwidth will actually exceed the maximum throughput of e.g. gp3
EBS.
This behaviour will likely apply to other workloads as well but I have data mostly / only for the large applications case.
Update: I missed the request about repeated reads. Yeah, some requests are repeated. But the key driver behind this feature request is exchanging the (slow) sequential reads with (fast) parallel pre-fetches of the blocks users already know a workload will request. In that sense it's not so much about caching per se but about leveraging the cache for prefetching.
Tell us more about this new feature.
Certain classes of workloads (such as applications) combine well known initial / general access patterns for files with a relative sensitivity to latency. Such workloads would profit a lot from additional enhancements to the new cache mode.
Very coarse grained this could look like this (note that I'm really only considering the application use case here, so read-only or at maybe read-only with local copy-on-write semantics using an additional overlay filesystem).
mountpoint
would require a new cache-profile-creation mode which records a timeline of the read access patterns (either flushed in regular intervals to the filesystem or available as a dump e.g. via signal). This timeline should contain files and their individual blocks retrieved in order. It should also contain the sum (in MiB) of the unique blocks retrieved from the start ofmountpoint
.Ideally these profiles would be in a format that would be conductive to a workflow that would allow for
This could be done by either adding the necessary tooling into
mountpoint
itself or by being clever with e.g.csvkit
(maybe - I did not think too deep about the format yet).To use these profiles,
mountpoint
would require a new matching cache-profile-replay mode that would then try to aggressively backfill the cache directly after start, in the correct order. This would potentially require a cache modification to prevent redundant fetches that have already hit a cache or are in progress (I have not checked the existing implementation if that's already the case).Making the cache sharable between multiple
mountpoint
processes might make this feature even more useful.Update (27.11.): added a few minor clarifications.