Open puma314 opened 2 months ago
Could this be a tower layer?
Seeing https://docs.rs/tower/latest/tower/ready_cache/cache/struct.ReadyCache.html - cc @mattsse does this work?
I'm not fully sure I understand how the tower
works, but noting that we'd want to save stuff to a file so its persisted across instantiations (and not just have the cache in memory as an example).
We won't add caching at the Transport
layer via tower
because caching (unlike rate limiting or retrying) needs to be aware of the RPC semantics and potentially the provider heartbeat task, so that it can invalidate caches on new blocks and reorgs. This means we need it to be a provider alloy_provider::Layer
producing CachingProvider<P, T, N>
, rather than a tower::Layer
producing CachingTransport<T>
.
This is blocked by #736 (which is pretty straightforward to resolve)
Is the use case here making a high volume of requests against specific deep historical states? It sounds like you actually don't want to cache to a file. You want an in-memory cache that is persisted to a file when your program stops? I'm in general not in favor of caching to/from a file directly, as responses get invalidated so regularly, fs access degrades perf, and the target user for alloy doesn't have an archive node and doesn't make queries against the deep state. Would it be enough to have the cache internals be (de)serializable and a way to instantiate the cache with data in it?
This means we need it to be a provider alloy_provider::Layer producing CachingProvider<P, T, N>, rather than a tower::Layer producing CachingTransport
.
Good point, supportive.
It sounds like you actually don't want to cache to a file. You want an in-memory cache that is persisted to a file when your program stops?
@puma314 basically this means:
Yup that sounds great. @prestwich our use-case is that we are querying getProof
and getStorage
on blocks potentially hours, etc. in the past (so blocks that are well past the reorg window). We are using this for generating a ZKP, so we wouldn't want to generate a ZKP of a block that could be re-orged, if that makes sense.
@gakonst's proposed suggestion looks great to me as a potential devex.
when you ctrl +c the cache's drop impl gets called, persisting everything to disk
serialization and fs ops are fallible and cant be reliably used in a Drop. so I wouldnt recommend this approach
More broadly tho, a file system-backed cache of finalized responses is not broadly applicable and requires us to make decisions about the user's fs. I am not in favor of including it in the main alloy crates. A memory cache that can be loaded from fs at runtime and serialized to fs on demand is applicable to a lot of users, and could be in the main provider crate. Would that fit your need?
Assuming you're running your own infra, the need may also be better served by accessing reth db or staticfiles directly? If running alongside reth, retrieving proofs and then storing them to the file system is duplicating data that's already in the file system, no?
serialization and fs ops are fallible and cant be reliably used in a Drop. so I wouldnt recommend this approach More broadly tho, a file system-backed cache of finalized responses is not broadly applicable and requires us to make decisions about the user's fs. I am not in favor of including it in the main alloy crates.
I've used this method before multiple times for debugging (e.g in MEV Inspect) and it's generally been fine, so I personally don't worry about the fallibility, but OK with doing this as a separate crate.
A memory cache that can be loaded from fs at runtime and serialized to fs on demand is applicable to a lot of users, and could be in the main provider crate. Would that fit your need?
How should the cache be populated in this case? Still via ProviderLayer
where each method populates an LRU of the data on cache miss? And is it responsibility of the user to flush the cache to disk?
Assuming you're running your own infra, the need may also be better served by accessing reth db or staticfiles directly? If running alongside reth, retrieving proofs and then storing them to the file system is duplicating data that's already in the file system, no?
Proofs aren't part of the Reth DB, they get generated on the fly, don't think this would work
A memory cache that can be loaded from fs
and saved to fs
would work for me. I'm not running my own infra in this case--the point is that for basically any chain we can get all the storage slots & proofs for running a block in a zkVM, without the need to have a local node running that is synced for that chain. It's a lot lower friction if we can just plug in an RPC vs. having to sync a reth instance. (Also I'm not sure if reth has getProof implemented yet).
let mut cache = MemoryCache::load("file.txt");
let provider = RequestProvider.(...).with_cache(cache);
// do stuff with provider
cache.save("file.txt")
seems totally fine to me.
SG re: the API above! Confirming that if you do stuff with provider
that hit the actual backend and not the cache, the new file.txt
should 1) include all the requests which were not cached before, 2) all the previous contents of the cache?
eth_getProof
is implemented in Reth, but not the historical variant for arbitrary lookback due to limitations of the Erigon DB design which we inherit.
I've used this method before multiple times for debugging (e.g in MEV Inspect) and it's generally been fine, so I personally don't worry about the fallibility, but OK with doing this as a separate crate.
Panics in drops cause aborts, so you can do it, but it's not a decision we want to make on behalf of all users, as we don't know what conditions they're running in
A memory cache that can be loaded from
fs
and saved tofs
would work for me. I'm not running my own infra in this case--the point is that for basically any chain we can get all the storage slots & proofs for running a block in a zkVM, without the need to have a local node running that is synced for that chain. It's a lot lower friction if we can just plug in an RPC vs. having to sync a reth instance. (Also I'm not sure if reth has getProof implemented yet).let mut cache = MemoryCache::load("file.txt"); let provider = RequestProvider.(...).with_cache(cache); // do stuff with provider cache.save("file.txt")
seems totally fine to me.
instantiation should run through the builder API, so the sketch here is something like:
/// Cache object
struct Cache { ... }
/// Caching configuration object
struct CachingLayer { cache: Option<Cache> // other fields? }
/// Provider with cache
struct CachingProvider<P,N,T> { inner: P, cache: Cache }
let provider = builder.layer(CachingLayer::from_file("file.txt")?).http(url)
do you have a ballpark for number of proofs/etc you intend to cache?
I think we would need low 100s of proofs per block, since it's all accounts/state that was touched during a block.
so i think actionable steps for implementing this are:
Component
provider, pubsub
Describe the feature you would like
For use-cases like
SP1-Reth
orKona
, we often want to execute a (historical) block, but we don't have the entire state in memory and we execute this block with aProviderDb
that fetches accounts, storage, etc. using an RPC. Fetching from the network is slow and often takes minutes for all of the accesses required for an entire block.Often we re-run these blocks to debug things or tune performance, etc. and each time the feedback loop on iteration is very slow because it requires waiting for all the network requests each time. It would be nice to add a very simple caching layer on top of
ReqwestProvider
that can cache the results of RPC calls to a file (or some other easy to set up format) and then first check the cache before sending a network request.This would speed up iteration time for use-cases like
Kona
andSP1-Reth
tremendously.An interface like this might make sense:
In our case, we are usually querying old blocks (not near the tip of the chain), so re-org awareness is not important for our use-case. We just want a really simple caching layer.
Additional context
No response