ampproject / amp-wp

Enable AMP on your WordPress site, the WordPress way.
https://wordpress.org/plugins/amp/
GNU General Public License v2.0
1.79k stars 383 forks source link

Use WP Cron to periodically check site URLs for AMP validation #1756

Closed westonruter closed 3 years ago

westonruter commented 5 years ago

Breaking out from #1007 which focused specifically on using WP-CLI to do validation of an entire site.

At the moment AMP validation is only performed in response to editing a post, or activating a plugin. There should be passive validation checks performed so that the user can be informed of issues that arise from adding widgets or making other site changes that do not result in validation being performed. WP Cron can be used to initiate revalidation requests for random URLs of enabled templates. This will make the "At a glance" Dashboard widget much more useful on admin as well, as it will actually highlight new issues that the user isn't already immediately aware of:

image

Relates to #1755 which is for adding a UI in the admin for bulk validation.

swissspidy commented 5 years ago

Integrating with cron shouldn't be too hard I think. Just need to figure out which pages to scan. Instead of (or in addition to) random URLs, I'd love to see periodical checks of the most popular posts.

swissspidy commented 5 years ago

I just did the following:

  1. Install and activate the plugin
  2. Go to settings and select native mode (because the current theme works well with it)
  3. Wait for page to reload. Receive error message:

Native mode activated! However, there was an error when checking the AMP validity for your site. cURL error 28: Operation timed out after 15001 milliseconds with 0 bytes received.

The error message is probably because of my Lando setup, but it's certainly a bit odd to wait that long for the page to reload because the plugin checks the validity of the site right after changing the settings. This would be a great opportunity to use cron as well. It could simply schedule an event 1min in the future or so.

westonruter commented 5 years ago

Yes, I've faced difficulty with timeouts in Lando for the validation requests as well. In particular, it seems the PHP-CSS-Parser performs extremely slowly in the Docker environment. Any intensive PHP computations seem to be very slow.

The reason for checking validity synchronously during the settings save is so that any success message can be served back in PHP. It would be better indeed to do it asynchronously after saving and show a spinner on the resulting page once the results are back. For that matter, the whole screen needs to be redone to use the REST API as opposed to the classic full-page-reload from the legacy admin screens.

swissspidy commented 5 years ago

There are some small UX things on the settings screen I'd love to improve anyway, so this sounds like a good excuse to do that 😀

westonruter commented 4 years ago

This is going to be of critical importance once we allow Developer Tools to be turned off. When that happens, validation will no longer be done synchronously when a post is updated. We need for validation errors to still be gathered in an automated way and reported to the administrator in the life of a site, across posts being published and plugins being updated. So here's what I propose:

  1. On a daily basis, update the validation results for the most recently-published posts in each post type, the homepage, and the other templates (using similar logic to what WP-CLI is doing).
  2. Schedule a cron event every time a published post is updated to re-check for validity.
  3. Introduce AMP Validation Site Health test which is shown as a warning whenever there are validation errors that are unreviewed. The Site Health test can link off to the Validated URLs screen even if the current user has Developer Tools turned off.
westonruter commented 4 years ago

The site health test message can not only flag the count of URLs that have unreviewed validation errors, but also the list of plugins that are causing them. Then the CTA can be to go to the Plugin Suppression section on the admin screen.

amedina commented 4 years ago

Another CTA could be Contact System Admin or similar flow to request support.

On Tue, Jun 30, 2020 at 2:56 PM Weston Ruter notifications@github.com wrote:

The site health test message can not only flag the count of URLs that have unreviewed validation errors, but also the list of plugins that are causing them. Then the CTA can be to go to the Plugin Suppression section on the admin screen.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ampproject/amp-wp/issues/1756#issuecomment-652067205, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD3R2JIB67ENKO22NMMIE3RZJNRBANCNFSM4GLLKNCQ .

westonruter commented 4 years ago

Yeah, we could do a query for the other administrators and list those who have Developer Tools turned on as good candidates to reach out to.

westonruter commented 4 years ago

Depends on #4779.

westonruter commented 4 years ago

Note: the same underlying mechanisms here should also be site scanning (#4795 and #4719)

johnwatkins0 commented 4 years ago

re: The discussion we had earlier today about the site scan REST endpoint. I was thinking that if all validation uses the same underlying mechanism, we could potentially solve some problems by working on the cron along with the site scanning for the wizard, etc. in tandem.

Maybe we can do something along these lines:

  1. The cron task kicks off on plugin activation and runs with a limit of, say, five URLs per content type (could be fewer).
  2. Save the result for each URL as it's validated in the cron task.
  3. In the REST endpoint, when requesting results for a URL, signal that we prefer the saved result over a fresh validation. With the REST endpoint and the cron action sharing the same underlying code, it's likely that all of the requested URLs will have already been saved (depending on how quickly the user reaches the wizard after activating the plugin).
  4. In the REST endpoint we can batch the query in some way (I have to work through this) so that if all the queried URLs have saved data, then only one request will be needed, but if there are URLs without saved data, they'll be sent back one at a time (for a progress bar or similar UI).

A lot of the underlying code for this already exists, and I've already begun moving the pieces around in #5228 (WIP).

westonruter commented 4 years ago
  1. The cron task kicks off on plugin activation and runs with a limit of, say, five URLs per content type (could be fewer).

Yes, this could reduce the perceived time involved to perform the site scan step, if they are happening in the background. We'd need to make sure that WP Cron gets spawned immediately after activation and not wait around for another request, as if the user is activating the plugin for the first time, they may cause WP Cron to spawn when they click the banner to access the onboarding wizard, and the scan results will not yet be available.

  1. In the REST endpoint, when requesting results for a URL, signal that we prefer the saved result over a fresh validation. With the REST endpoint and the cron action sharing the same underlying code, it's likely that all of the requested URLs will have already been saved (depending on how quickly the user reaches the wizard after activating the plugin).

The way to determine if a given URL is fresh or not is via AMP_Validated_URL_Post_Type::get_post_staleness(). So yeah, we could avoid re-scanning a URL if it's results are fresh. This would be useful when on the settings screen in particular and we want to show an overview of the last results, in addition to providing a way to re-scan. One complication is that gathering posts for checking may change each time a scan is done, if we use the most recent post for example. So a way to mitigate that could be to always get the oldest ID (or smallest object ID when datetimes are not available).

4. In the REST endpoint we can batch the query in some way (I have to work through this) so that if all the queried URLs have saved data, then only one request will be needed, but if there are URLs without saved data, they'll be sent back one at a time (for a progress bar or similar UI).

This relates to what @schlessera mentioned in regards to retaining the loopback requests. If you go to the Validated URLs post table screen and select multiple validated URLs, you can do a bulk re-check of those URLs. This will be a batch operation. A problem with doing this, however, is that there can be timeouts.