Open bekarice opened 7 years ago
updated OP with notes to check that server hostname is properly configured
Hosts that block or rate-limit POST requests to the admin-ajax endpoint. This will cause the job to remain queued, as the maybe_handle() method in the async request class doesn't fire right away or at all.
@bekarice wait... do they only limit POST
requests? Could we get away by simply using wp_safe_remote_get()
instead of wp_safe_remote_post()
? Or am I missing something? They both support non-blocking cURL requests. I just tested this locally and the async part worked flawlessly.
Also, just found this article covering a few alternatives we could look into: https://segment.com/blog/how-to-make-async-requests-in-php/
With ZD 544840, it turned out the server couldn't find its own hostname, so wp remote post using cURL couldn't look up its own DNS and therefore execute that request. We could do some lower level checks to try to ensure the hostname is known, as there have been a couple instances that were similar that I think the same misconfiguration could have been happening
I'm curious how this happens? When we wp_remote_post()
, we'll simply use admin_url()
to create the endpoint to request. This, in turn uses get_option( 'siteurl' )
to get the base url from the database. The siteurl
option is set during install, using wp_guess_url()
. So if my assumptions are correct, then the site should not really function at all for that customer, since WP so heavily relies on the siteurl
option being correctly set. However, it might be that I misunderstood the whole issue.
Use direct SQL queries for setting / getting job status because some hosts don't properly clear the option cache when requested. This will bypass caching.
@bekarice do you think we simply want to have a simple SQL query or should we create our own versions of add/update/get_option functions that also trigger all the hooks and filters that are normally run?
EDIT: thinking more about this - I'm not sure how the Object Cache in GoDaddy/WPE works (do they supply their own version of WP_Object_Cache?), but would it be feasible to invalidate/flush the cache before add/update/get_option
? For example:
wp_cache_delete( "{$this->identifier}_job_{$id}" );
$results = get_option( "{$this->identifier}_job_{$id}" );
Or am I completely off track here?
AFAIK GoDaddy/WPE (and probably other hosts) provide their own version of the object cache so they can use other persistence layers like Redis, memcached, etc. When we were troubleshooting the tickets with these issues, I think we tried clearing the cache but it didn't have any effect.
For the purposes of a job processing system, you really don't want any caching involved at all. I think we'd want simple SQL queries without any hooks or filters.
re: the rate-limited POST requests, I don't know that there is a good solution. A GET
requested might work better, but I suspect it's more likely to be cached, which is bad. I think what we would benefit from here is a health check that confirms the server can make POST requests to it's own admin-ajax.php
endpoint, so that we have an easy way when doing support to know what's going wrong.
re: the rate-limited POST requests, I don't know that there is a good solution. A GET requested might work better, but I suspect it's more likely to be cached, which is bad. I think what we would benefit from here is a health check that confirms the server can make POST requests to it's own admin-ajax.php endpoint, so that we have an easy way when doing support to know what's going wrong.
Hm, you're probably right about GET
requests possibly being cached. But could we use cache-busting? If the URL is different each time, then the it should not be cached, right? For example, we can simply append a md5 of the job id and current timestamp to the url, something like this:
http://example.com/wp-admin/admin-ajax.php?action=run_background_process&stamp=jdshh8272318jsadas847
Given that query-strings are not cached on some CDNs/proxies, it should work in our favour. What do you think?
Great point, we could definitely add a timestamp to the URLs which should work. I'm still not sure if it's the HTTP method that's the real issue with these sites or other weirdness (like the one where the server's DNS was busted). It stands to reason that if a hosting provider is blocking or rate limiting POST requests to the admin-ajax endpoint, they'd probably do that for any type? Before we make this kind of change, I'd like to be able to test on a server that has this specific issue, so we can see if changing the method makes any difference. @bekarice any recent support tickets that we could use here?
@maxrice @ragulka ZD 544490 has picked back up again actually, that's with WPE so that's probably the best candidate since that seems to be the most problematic with rate limiting.
@bekarice I've patched CSV Export with direct SQL queries and GET requests in https://github.com/skyverge/wc-plugins/pull/2110 - can this be tested on the customer's site to see if it fixes the issue for them?
@ragulka getting FTP creds for that site -- WPE says object caching is off, but there's some duplication happening. will loop you into that thread once I have it.
Following up on this with ZD 568347 -- this host blocks so-called "loopback connections" where the server is attempting to connect to itself, responding to both GET and POST requests with HTTP 403. Obviously our async process never starts, and exports are left in the queued state with only the column headers present.
Amusingly (and I guess expected), if you copy the generated URL that the dispatch method tries to connect to, and visit that in the browser, that kicks off all the background jobs.
I'd say that we need
1) a big red admin notice that indicates to the admin that their hosting provider is blocking loopback connections. Ideally we'll check this on plugin install/upgrade, and provide a system status tool for good measure. 2) A potential workaround where we make client-side Ajax requests to the dispatch URL in order to get the exports to process. Since they won't appear to be coming from the server, there shouldn't be any issues with these lame hosting providers in that case. The obviously caveat here is that the user won't be able to leave the page when the background job is in-progress, or else processing will stop.
I'm not a huge fan of option 2, but I guess it's at least a workaround to use as a fallback if we detect that the host won't accept this connection. We'd just want to keep this in the job notice + admin notice so people know it will function differently for them, and you'd have to have a js alert likely if they try to leave the page.
Would we need to fail gracefully on the job somehow if they do (ie delete it or just stop progress on the current row)?
I think with Max's points and this
With ZD 544840, it turned out the server couldn't find its own hostname, so wp remote post using cURL couldn't look up its own DNS and therefore execute that request. We could do some lower level checks to try to ensure the hostname is known, as there have been a couple instances that were similar that I think the same misconfiguration could have been happening
we could likely close this out.
cc @mikejolley who may be interested in a couple of the above changes (I think $wpdb
vs update_option()
/ get_option()
in #209 has helped at least, but still a bit early to tell if it's solved the majority of caching issues).
I think a fallback js version with an alert when trying to leave the page is a fine workaround. Although even doing so we should really encourage merchants to contact their hosts to enable loopback connections.
one limitation to consider: auto-exports probably couldn't use a js / client side request system since it would require the browser to fire requests, so this really would only be a workaround for manual exports. with that in mind, do we think it would still be worthwhile to pursue, or just require people to use a reasonable host?
In our utilities classes, we commonly run into a couple issues:
maybe_handle()
method in the async request class doesn't fire right away or at all.To resolve these, we should:
Related issues
Related threads