Closed konstin closed 1 year ago
I may need further clarification to fully understand the request, but it sounds like you would like the ability to modify a particular response from a remote endpoint before it's stored in the cache?
Yes, that or even better the ability to do the actual caching myself and only have this library tell when or when not to use the cache. I'm not entirely sure how that would look like, in previous cases i just stored the etag myself but i'd prefer a library that properly implements http caching semantics
@konstin would https://crates.io/crates/http-cache-semantics help you?
It does sound like using the semantics crate directly might be want you want, it's what this crate uses as well to determine staleness/cache-ability.
Though I still might explore this a bit as it sounds interesting!
would https://crates.io/crates/http-cache-semantics help you?
Kinda! In a way i'm looking for something that integrates http-cache-semantics with reqwest and serde, but only gives me some kind of CacheInfo: Deserialize + Serialize
and leaves the actual caching to me - I'm not sure though the actual design and happy about suggestions!
Hmm, I wonder then if you might find implementing a custom cache manager helpful?
The trait is defined as:
#[async_trait::async_trait]
pub trait CacheManager: Send + Sync + 'static {
/// Attempts to pull a cached response and related policy from cache.
async fn get(
&self,
cache_key: &str,
) -> Result<Option<(HttpResponse, CachePolicy)>>;
/// Attempts to cache a response and related policy.
async fn put(
&self,
cache_key: String,
res: HttpResponse,
policy: CachePolicy,
) -> Result<HttpResponse>;
/// Attempts to remove a record from cache.
async fn delete(&self, cache_key: &str) -> Result<()>;
}
Also the cacache implementation as an example here.
So on put you would receive the cache key, the response, and the cache policy from the semantics crate. Though if you need any logic for making additional requests based on the logic in the cache manager function you define then things could get tricky (likely the logic would need to be somewhere else like in the client middleware implementation or via some new functionality added to the HttpCacheOptions like some previous changes that were made)
I now went with implementing a custom cached reqwest client on top of http-cache-semantics: https://gist.github.com/konstin/54b983e7f0f4f77d38b4151e6a9f295c . Using the http-cache reqwest impl a reference was really helpful!
I can use this client like
let transform_response = |response: Response| async move {
Ok(Metadata::parse(response.bytes().await?.as_ref())?)
};
self.cached_client
.get_transformed_cached(url, &cache_file, transform_response)
.await
and
let client = self.client_raw.clone();
let url_ = url.clone();
let read_metadata_from_initial_response = |response: Response| async {
let mut reader =
AsyncHttpRangeReader::new_head_response(client, url_, response).await?;
trace!("Getting metadata for {filename} by range request");
let text = metadata_from_remote_zip(filename, &mut reader).await?;
let metadata = Metadata::parse(text.as_bytes())?;
Ok(metadata)
};
let result = self
.cached_client
.get_transformed_cached(
url.clone(),
&cache_file,
read_metadata_from_initial_response,
)
.await;
Thank you for you helpful comments!
For some of the requests we make with reqwest, i'd like to not store the request but some transformed (much smaller) data derived from the response. Is that possible with this library? I'm imagining something like:
For a different request type we make an initial range request and then several range requests to extract some data, where i need effectively the same functionality: Make an initial HEAD request and either return the cached response or extract some header and continue with some (uncached) GET range requests.