clearlydefined / service

The service side of clearlydefined.io
MIT License
45 stars 40 forks source link

purge cloudflare edge cache from the service #418

Open dabutvin opened 5 years ago

dabutvin commented 5 years ago

when a definition or curation are invalidated we clear the redis cache and recompute as necessary. We should also call out to clear the edge cache so the updates are visible right away.

see https://api.cloudflare.com/#zone-purge-files-by-url

We should be able to purge by url and get the benefits of the edge caching and have instant updates

ignacionr commented 5 years ago

My proposal here is that, instead of recreating URLs to be purged, we use Cloudflare's purge-by-tag feature, which would allow us to tag contents with the coordinate hashes that they relate to. Then we'd be able to flush all cached items relating to the definitions we've deemed invalid.

In short, we get:

The burden:

jeffmcaffer commented 5 years ago

Can you give an example of a tagging scheme? There is no real rhyme nor reason to what definitions are recomputed when so we need to flush them randomly and on-demand.

I do have a concern about how often we might be calling the flush api with a URL approach. We will recompute definitions ~5 times on first harvest (once for each tool). While we don't know the steady state rate, there are bound to be 000s of new packages and GitHub releases per day. Can the flushes be batched up into say 5 minute chunks?

ignacionr commented 5 years ago

Got it. Yes, we can batch them leveraging redis for that. We could additionally fall back to a non-selective approach when the updates batched prove numerous.

jeffmcaffer commented 5 years ago

not even sure we have to use redis. the flushing is not for functional correctness rather timely availability. We could use an inproc list that gets flushed every N seconds or X entries. If the service crashes, those cache entries may not be flushed but they will expire reasonably soon (2 hours?).

I'm not against redis (love it!) but also don't want to introduce more complexity than is needed.

ignacionr commented 5 years ago

Sure can do.

geneh commented 5 years ago

Cloudflare's purge-by-tag feature is only available to Enterprise customers. We're using a standard tier, so it is not going to work: https://blog.cloudflare.com/introducing-a-powerful-way-to-purge-cache-on-cloudflare-purge-by-cache-tag/

Cache Tag Availability
Purge by Cache-Tag is enabled automatically for all Enterprise plan websites. All a developer has to do to get started is add the Cache-Tag HTTP response header to items on their website. If you are not yet an Enterprise customer, get in touch with our team here.

The current timeout is set to the minimum available time period of two hours, which is very reasonable given the circumstances. If customers start complaining, we'll re-open this issue and re-visit the design.

geneh commented 5 years ago

Reopening since this issue has not been resolved. Let's purge URLs based on the API above. The API calls should be batched in 5 min chunks as suggested above. In-memory batching should be good enough to keep things simple.

ignacionr commented 4 years ago

Rework in progress on the queuing strategy, after finding implicit scalability issues. Finding a way to avoid problematic number of URLs and CloudFlare throttling (tests provided for both angles).