Ipstenu / varnish-http-purge

Proxy Cache Purge
Apache License 2.0
46 stars 47 forks source link

Rest API archives / partial regex purge #96

Open lukasbesch opened 2 years ago

lukasbesch commented 2 years ago

We are looking into Varnish + this plugin to optimize our caching strategy – especially the invalidation based on Regex. In addition to the website it self there is a mobile app which heavily uses the WordPress REST API. The users are able to filter, sort or search using the /wp-json/wp/v2/posts endpoint. So if a post is created or updated, we need to purge not only the actual permalink and the id-specific endpoint, but also the archive with all possible query parameters and combinations:

URLs to purge --- 1. `https://site.com/the-post-name` 2. `https://site.com/wp-json/wp/v2/posts/:id` including: a. `https://site.com/wp-json/wp/v2/posts/:id?` b. `https://site.com/wp-json/wp/v2/posts/:id/` c. `https://site.com/wp-json/wp/v2/posts/:id/?` d. `https://site.com/wp-json/wp/v2/posts/:id?_embed` and all other query parameters e. `https://site.com/wp-json/wp/v2/posts/:id/?_embed` and all other query parameters (`fields` etc) but not: f. `https://site.com/wp-json/wp/v2/posts/:anotherId` (so other posts) 6. `https://site.com/wp-json/wp/v2/posts` including: a. `https://site.com/wp-json/wp/v2/posts?` b. `https://site.com/wp-json/wp/v2/posts/` c. `https://site.com/wp-json/wp/v2/posts/?` d. `https://site.com/wp-json/wp/v2/posts?_embed&orderby=date` and all other query parameters e. `https://site.com/wp-json/wp/v2/posts/?_embed&orderby=date` and all other query parameters _(i think some urls are redundant because trailing slashes or empty query strings will be removed)_ ---

Of course, this should happen for every post type that has a rest_base defined and is public. I tested a regex to play around. Similarly this applies the taxonomy archives as well (when a term is created or updated).

As far as I understand, this plugin does not clear the post archive (REST-API) but only the single post. Currently, the regex purging is only used for a full purge. But it would be a good solution because for example search terms are unpredictable.

One solution is to hook into vhp_purge_urls, and possibly add the required urls too. But we need to use some regex to match everything. Maybe we can use the vhp-regex query parameter and assign a value with the regex (this would require possibly breaking changes to the plugin and a customized VCL file). So that if a PURGE request contains a vhp-regex query parameter (even better use a header for this?) this is used instead of the requests url.

Is this something more people are interested in, or did someone approach this task?

I see some comments that somebody thought about it before :)

Ipstenu commented 2 years ago

Breaking changes that require VCL changes are something really to be avoided. There's no way to communicate to the right people, since the plugin users aren't always the varnish admins :( That's why it's a comment and something I mess with but haven't yet stepped fully into.

It would need to be a pure WP solution to identify what should be flushed.

lukasbesch commented 2 years ago

@Ipstenu I understand that not everybody is able to change their VCL. My approach in #97 is to use the same URL as before, but with two additional headers X-Purge-Method: ban-regex and X-Ban-Regex: the-regex. If the VCL supports these headers, it will be purged using the defined regex, otherwise the request URL is used as before. Does that work for you?

Example call for a REST-API index:

curl \
  -X PURGE \
  -H "X-Purge-Method: ban-regex" \
  -H "X-Ban-Regex: ^/wp-json/wp/v2/posts($|/$|\?.*|/\?.*)" \
  -D \
  – \
  "https://www.site.com/wp-json/wp/v2/posts/"

This method could possibly be used for other endpoints too (/wp-json/wp/v2/search, taxonomies but maybe also non-API URLs).

It would need to be a pure WP solution to identify what should be flushed.

This can be really unpredictable (in terms of query arguments). We are using custom taxonomies and users can filter by them, so the amount of query arguments and their combinations is huge or infinite (e.g. for the search parameter). But of course we do not want to purge the entire cache everytime.