Ipstenu / varnish-http-purge

Proxy Cache Purge
Apache License 2.0
46 stars 48 forks source link

PURGE requests should be non-blocking & Issues with redirection/302/WP HTTP API #56

Open jerclarke opened 6 years ago

jerclarke commented 6 years ago

This was hell to debug. I discovered it because on my local system, all the PURGE calls were going through as actual GET pageloads, and my macbook was getting overwhelmed loading dozens of pages all at once each time I saved a published post (took over a minute).

TL;DR: If the current web server redirects PURGE requests as 302 then they become GET and load the actual URLs on the site, which makes saving a post extremely slow.

TL;DR2: The HTTP API allows requests to be non-blocking and thus not hold up the pageload, why not use 'blocking' => 0 this for this plugin?

It was slow because all the PURGEs were being done as GET

This part is mostly for information purposes, in case someone else stumbles onto a similar problem, but it also explains my situation and why I think the 'blocking' => 0 solution described below is important.

It has taken me literally years to figure out that this is what's happening, which is embarassing but related to how deep the problem lies. I had to dig into the logs and add logging deep inside the HTTP API methods to figure it out. The user experience was just that saving was excruciatingly slow on my local site.

After investigation the following became clear:

Important notes:

SO: In terms of this HTTP API confusion with redirection anyone with similar problems has a couple of options:

Bad solution: In purge_url, pass 'redirection' = 0 to wp_remote_request()

i.e.
    $response = wp_remote_request( $purgeme, array(
            'method'  => 'PURGE',
            'headers' => $headers,
            'redirection' => 0,
        ) );

This solved my problem with local dev because it means that when my Apache in MAMP gets the purge request and redirects it to HTTPS, it just silently stops.

The problem with this is that if the redirection is a 301, it DOES work! We discovered that our actual live server was relying on this for proper functioning. To me, this implies that this plugin shouldn't disable redirection, because there may be many users relying on the effectiveness of redirected PURGE calls, even if in the grander scheme they aren't reliable (and really, everyone should work to make sure their purges are happening without redirection).

That said, I wanted to point this out because for some cases it could be a good workaround, and it seemed like a good option to me at first.

Better solution? Use 'blocking' => 0 instead

By default all WP HTTP API requests are "blocking" and the whole system waits for the resulting $response object to be generated and returned. Obviously sometimes this is vital, but in the case of these PURGE requests I think the 'blocking' => 0 mode would be a really good fit:

Have you considered 'blocking' => 0 before? Maybe there's a reason not to use it that I haven't thought of?

The main reason I can see is to preserve the current functioning of the filter:

    do_action( 'after_purge_url', $url, $purgeme, $response, $headers );

If the non-blocking mode was used, there wouldn't be a $response object to include in that filter (or, if left as-is, it would always be a blank default response object).

That said, it strikes me that the $response object, though useful here, is something we can probably live without as long as the documentation makes it clear that the $response is always empty because we (logically) use the non-blocking mode for the request.

We can still get the response with the http_api_debug action in WP_Http->request()

do_action( 'http_api_debug', $response, 'response', 'Requests', $r, $url );

So IDK if this is something you'd want to build into the plugin, but IMHO if making it easy to debug the purging is a priority, and otherwise 'blocking' => 0 makes sense to you, then you can easily document how to debug these purges and/or add it as a plugin feature using the action above:

This would work just as well as the existing hook, and now that I think of it, you could keep after_purge_url intact if you wanted, by creating your own hook on http_api_debug and setting up the variables so that after_purge_url is backwards-compatible.

So after all that, the question is: Why not use 'blocking' => 0 as the default mode for the plugin?

On my end, despite all this digging, it seems it's still pretty slow on local, even if I set it to 'blocking' => 0, because MAMP still loads every page and it takes 18 seconds reliably before I get back to the refreshed post editor. If I want to keep this plugin active on my dev environment I think i'll need to set up something that quickly and silently returns 200 for PURGE requests to mimic Varnish.

Ipstenu commented 6 years ago

Why not use 'blocking' => 0 as the default mode for the plugin?

Quick answer... Believe it or not, it's never come up. I test on a local box without Varnish to make sure the non purge things work (like the scanner and the debugger and admin stuff). Then I push it to a dev box with Varnish to test Varnish.

But this is a legit thing and it's fascinating to me that it hasn't come up before, so I'm going to spend some serious time reading this a couple more times and thinking about what to do with it.

My gut likes the idea of a VHP_DEBUG_PURGES define, but I'm not sure which is more useful to the majority since ... well. Most people just install and walk away. They don't really do the loopy things like trigger page A's flush when page B updates (like .. I do).

This needs serious thinking. I like the problem and the possible solutions you've found, Jer!

danielbachhuber commented 6 years ago

So after all that, the question is: Why not use 'blocking' => 0 as the default mode for the plugin?

One thing worth noting: 'blocking' => 0 only works on a limited set of systems. I only recall this anecdotally and would need to do more research to identify the specific contexts.

jerclarke commented 6 years ago

Awesome thanks @Ipstenu ! So glad it wasn't just a confusing mess 😅

I think the logging stuff would be a big help to a lot of people, and if nothing else would be good to have for yourself it sounds like. Overall I think that yeah, a lot of people aren't managing their flushes carefully, but really most people should be 🤷🏻‍♀️

@danielbachhuber Yes, it seems on my local MAMP setup, blocking = > 0 has no meaningful effect on the timing, and seems to still make the post save wait.

I'm still in the process of testing it on our live server ([Ubuntu]Nginx->Varnish->Nginx)

gerdneuman commented 6 years ago

@danielbachhuber I guess even if blocking => false does not work on all systems (I cannot find a reference for it, too, so are you sure?) it would probably not hurt to adding it after I read https://codex.wordpress.org/HTTP_API#Other_Arguments and https://developer.wordpress.org/reference/classes/WP_Http/request/ (search for blocking in it). As @jerclarke said: The response is not used anyway.

@jerclarke any results from testing on the linux live server? Have you ever measure doing a time curl -X PURGE <url> to see how long the request need in the first place? I guess php would be similar and on localhost it is probably far less then if network latency is involved.

danielbachhuber commented 6 years ago

I guess even if blocking => false does not work on all systems (I cannot find a reference for it, too, so are you sure?)

This was something Barry told me when I attempted to implement usage analytics for WP-CLI https://github.com/wp-cli/wp-cli/issues/3886

I don't know the underlying implementation details off the top of my head. If I were to look into it, I'd start with https://github.com/rmccue/Requests/issues/53 and understanding how cURL and fsockopen implement blocking => false (whether they actually spawn separate processes, or it remains blocking simply because PHP is single-threaded).

it would probably not hurt to adding it after I read

It would be nice to have documented, I agree. We'd need to make sure the documentation is actually correct though.

gerdneuman commented 6 years ago

@danielbachhuber Thanks, I see.

Via a Stackoverflow answer, I've came across Guzzle which is supposed to support async http requests aka Futures / Promises according to its FAQ. So I guess this is something that might be worth looking into instead of the wp_remote_request.

EDIT:

Well, seems like it's not that simple :-/ after reading: https://github.com/guzzle/guzzle/issues/1127 https://github.com/guzzle/guzzle/issues/1425 https://github.com/guzzle/guzzle/issues/1429#issuecomment-197152914

gerdneuman commented 6 years ago

FWIW, I added blocking => false now on our server (which is Gandi.net's Simple Hosting platform). They use latest Debian GNU/Linux with PHP 7. And the request is still… blocking :-(

So despite some discussions in https://github.com/Ipstenu/varnish-http-purge/issues/5#issuecomment-414261460 and #66 it looks like it's definitely not easy to solve. Also even if it was I'd like to mention (as nobody has yet) that many hosters have restrictive settings for the number of parallel php processes aka max_children php setting. For instance, for Gandi.net size M it is 4, for L it is 8. So if you'd PURGE'ed 8 URLs on parallel then you'd DOSed your server.