Cache without storing the response

konstin commented 1 year ago

For some of the requests we make with reqwest, i'd like to not store the request but some transformed (much smaller) data derived from the response. Is that possible with this library? I'm imagining something like:

if let Some((transformed, http_cache_info)) = cache.get(cache_key) {
    // Returns None if the cache is fresh
    if let Some((fresh_response, http_cache_info)) = reqwest_maybe_get_cached(request, cache_key).await? {
        // Cache is outdated
        let transformed = parse(fresh_response.error_for_status()?.text().await?)?;
        cache.store((transformed, http_cache_info))
    } else {
        // The transformed response is fresh
        transformed
    }
} else {
    // No cache hit
    let (fresh_response, http_cache_info) = reqwest_get_cached(request, cache_key).await? {
    let transformed = parse(fresh_response.error_for_status()?.text().await?)?;
    cache.store((transformed, http_cache_info))
}

For a different request type we make an initial range request and then several range requests to extract some data, where i need effectively the same functionality: Make an initial HEAD request and either return the cached response or extract some header and continue with some (uncached) GET range requests.

06chaynes commented 1 year ago

I may need further clarification to fully understand the request, but it sounds like you would like the ability to modify a particular response from a remote endpoint before it's stored in the cache?

konstin commented 1 year ago

Yes, that or even better the ability to do the actual caching myself and only have this library tell when or when not to use the cache. I'm not entirely sure how that would look like, in previous cases i just stored the etag myself but i'd prefer a library that properly implements http caching semantics

baszalmstra commented 1 year ago

@konstin would https://crates.io/crates/http-cache-semantics help you?

06chaynes commented 1 year ago

It does sound like using the semantics crate directly might be want you want, it's what this crate uses as well to determine staleness/cache-ability.

Though I still might explore this a bit as it sounds interesting!

konstin commented 1 year ago

would https://crates.io/crates/http-cache-semantics help you?

Kinda! In a way i'm looking for something that integrates http-cache-semantics with reqwest and serde, but only gives me some kind of CacheInfo: Deserialize + Serialize and leaves the actual caching to me - I'm not sure though the actual design and happy about suggestions!

06chaynes commented 1 year ago

Hmm, I wonder then if you might find implementing a custom cache manager helpful?

The trait is defined as:

#[async_trait::async_trait]
pub trait CacheManager: Send + Sync + 'static {
    /// Attempts to pull a cached response and related policy from cache.
    async fn get(
        &self,
        cache_key: &str,
    ) -> Result<Option<(HttpResponse, CachePolicy)>>;
    /// Attempts to cache a response and related policy.
    async fn put(
        &self,
        cache_key: String,
        res: HttpResponse,
        policy: CachePolicy,
    ) -> Result<HttpResponse>;
    /// Attempts to remove a record from cache.
    async fn delete(&self, cache_key: &str) -> Result<()>;
}

Also the cacache implementation as an example here.

So on put you would receive the cache key, the response, and the cache policy from the semantics crate. Though if you need any logic for making additional requests based on the logic in the cache manager function you define then things could get tricky (likely the logic would need to be somewhere else like in the client middleware implementation or via some new functionality added to the HttpCacheOptions like some previous changes that were made)

konstin commented 1 year ago

I now went with implementing a custom cached reqwest client on top of http-cache-semantics: https://gist.github.com/konstin/54b983e7f0f4f77d38b4151e6a9f295c . Using the http-cache reqwest impl a reference was really helpful!

I can use this client like

let transform_response = |response: Response| async move {
    Ok(Metadata::parse(response.bytes().await?.as_ref())?)
};
self.cached_client
   .get_transformed_cached(url, &cache_file, transform_response)
   .await

and

let client = self.client_raw.clone();
let url_ = url.clone();
let read_metadata_from_initial_response = |response: Response| async {
    let mut reader =
        AsyncHttpRangeReader::new_head_response(client, url_, response).await?;
    trace!("Getting metadata for {filename} by range request");
    let text = metadata_from_remote_zip(filename, &mut reader).await?;
    let metadata = Metadata::parse(text.as_bytes())?;
    Ok(metadata)
};

let result = self
    .cached_client
    .get_transformed_cached(
        url.clone(),
        &cache_file,
        read_metadata_from_initial_response,
    )
    .await;

Thank you for you helpful comments!

06chaynes / http-cache

Cache without storing the response #57