GoogleChromeLabs / pwa-wp

WordPress feature plugin to bring Progressive Web Apps (PWA) to Core
https://wordpress.org/plugins/pwa/
GNU General Public License v2.0
609 stars 97 forks source link

Implement precaching #25

Closed westonruter closed 5 years ago

westonruter commented 6 years ago

As a site admin, I should have a way to precache important assets to give my users a faster perceived experience of using my WordPress site.

~AC1: Propose a user interface to select which assets should be pre-cached.~ AC2: As site admins might need to declare precached asset support in different ways, also propose a theme function and/or wp-cli script to select which assets should be precached.

Note: additional ACs likely to be added.


As noted by @jeffposnick in https://github.com/xwp/pwa-wp/issues/5#issuecomment-403082892:

Something I've not seen mentioned yet has to do with precaching—here's an explainer about how Workbox handles precaching. Precaching gives you the advantage of ensuring that, whenever your service worker is installed, a core set of URLs will be cached and kept up to date, even if they correspond to assets that haven't been used yet. (In other words, you get to cache important things in advance.)

This differs from using routing + caching strategies to implement runtime caching, in which assets and HTML is only cached after a user visits a page that uses them for the first time.

Precaching offers a number of advantages, both in terms of "priming" the cache, as well as efficient updates (a network request is only required when something in the precache manifest actually changes). The downside of precaching is that it requires some integration with a build process, as it Workbox needs to create hash "fingerprints" of local assets on the filesystem in order to create its precache manifest. That might not be viable in a generic Wordpress solution, but I wanted to throw it out there if there are hooks into a local build process that could be taken advantage of.

Precaching is indeed something we haven't talked about yet, but I think that WordPress has some good tooling we can use here. For example, external scripts and stylesheets get registered with dependency system where each URL get a stable handle that does not change. Each dependency gets a version as well which is used to cache-bust the URL, but in the case of precaching we could strip the version (ver) parameter when generating the response since the service worker could handle the cache busting itself.

westonruter commented 6 years ago

To elaborate, the precaching should be done for all registered scripts and styles, not just the ones that are enqueued. We'd also need to include the offline page (#23) among the precached URLs.

In that regard, one thing that comes to mind in regards to precaching the offline page is making sure that all assets linked to from the offline page are also included in the precache manifest. @jeffposnick Other than just loading the offline page in the browser to do runtime caching, is there any facility for Workbox to identify such assets to add them for precaching? I assume not, and it would seem more practical in this case to load the offline page in a hidden iframe on the client instead. The key point here is that themes and plugins will be adding an arbitrary number of scripts and styles in addition to the images that a user may be adding in the CMS to the offline page's content, so there isn't a static list of files we can rely on as part of a build process.

westonruter commented 6 years ago

See also https://github.com/WordPress-Coding-Standards/WordPress-Coding-Standards/issues/1439 as there is WordPress.WP.EnqueuedResourceParameters sniff which could be amended to warn users when they conditionally enqueue scripts & styles (at wp_enqueue_scripts action) without registering them first (e.g. at wp_default_scripts and wp_default_styles actions).

westonruter commented 6 years ago

Existing Solutions

In looking at existing WordPress plugins that implement offline support, here is a look at existing solutions based on the doc created as part of #2:

Offline Content

Uses runtime caching and only deletes stale caches upon installation (ref).

Offline Shell

Does do precaching but it requires the user to select the the list of assets to cache:

image

The assets listed are just taken by grabbing all files in the active theme. This is not ideal because:

  1. It requires manual user interaction.
  2. Many of the files don't make sense to cache since they'll never be requested directly (e.g. markdown files, PHP files).
  3. The theme may enqueue core scripts and styles not in the list.
  4. It does not account for assets that plugins will enqueue.

Super Progressive Web Apps

As far as I can see, only the requests for the start page and the offline page are cached: https://github.com/SuperPWA/Super-Progressive-Web-Apps/blob/a06f52e1b0ddb74057ffb6d22545a9ae285ffe12/public/sw.php#L109-L125

The assets that these pages load (e.g. scripts, styles, images) are not cached for offline.

Progressive WordPress

In the same way as SuperPWA, only the requests for the home page and offline page are cached upon installation:

The assets that the offline page depends on do not appear to be cached upon installation.

LH Web Application

Offline page is presented as a field to provide a page ID:

image

This the URL of the offline is then pre-cached but not the assets it depends on.

Minimum Configuration WordPress PWA

Uses Workbox. Admin offers precaching screen which allows you to cache arbitrary URLs, including a facility to scan static asset URLs to pre-populate the manifest:

image

There is an admin screen for selecting an offline page:

image

Only the offline page URL is precached. No assets are precached.


Other plugins?

nico-martin commented 6 years ago

As @westonruter mentioned "Progressive WordPress" won't cache any assets on installation. But since it caches everything "network first", the chance is pretty high the required resources for the offline page are stored as well after the first session.

However I think we need to find a more stable solution for our usecase. Even if I really like how "Offline Shell" does it I think thats too much configuration for most admins.

Scan the offline page I had a similar problem with an other plugin: Advanced WPPerformance has a http/2 ServerPush option, which adds preload-headers on the fly (script_loader_src, style_loader_src). This won't work if you are using server side caching. So I created a function that scans the front page and adds all assets to the .htaccess as Header Links: https://github.com/SayHelloGmbH/Advanced-WPPerformance/blob/master/Classes/class-http2push.php#L202

This function will run if you save the settings and as a WP-Cron (to detect any changes). We could try something like that for the offline page assets as well, store them in an option and add those sources to the installation.

But we could still run into problems if the site changes (cache refresh, updates) or the scan fails.

Scan inside the ServiceWorker I'm not sure if that works, but maybe we could scan the offline page during the installation and cache all JS/CSS/image files right there inside the serviceworker. That way we could make sure all files are cached together with the offline page.

westonruter commented 6 years ago

@nico-martin:

But since it caches everything "network first", the chance is pretty high the required resources for the offline page are stored as well after the first session.

Yeah, it's a pretty good guess that the assets used on the offline page, in particular the scripts and stylesheets, would also be loaded on the homepage. So if those assets are cached with the network first strategy then it should be OK. What isn't accounted for yet, however, are the assets that are unique to the offline page. We're talking about images and other media that are used on the page.

For example, GitHub has their cool imagery that is served on the 500 error page:

image

Scanning the page for assets would only work for assets that are at the top-level. It wouldn't account for, say, background images that are referenced in stylesheet assets.

Maybe a good middle ground would be to just look at the offline page post in the DB and locate any images and other assets that are linked to in the content, as well as the featured image that is assigned. These assets could then be easily included in the precache manifest, with the scripts and stylesheet assets being cached thereafter at runtime. There could be additional assets that are linked to from the admin screen that wouldn't get cached here, but I think we'd be covering the 80% scenario, and a site owner should then be able to manually include other assets via an API to account for the remaining 20%.

How does that sound?

However I think we need to find a more stable solution for our usecase. Even if I really like how "Offline Shell" does it I think thats too much configuration for most admins.

I agree with you 100%. The WordPress philosophy of “decisions not options” does not align with having this UI for users to manually select the assets to include in the precache manifest. It should work out of the box for the majority of use cases without there needing to be any configuation.

nico-martin commented 6 years ago

Sounds good to me. So you are implementing a network first caching for assets anyway? One other thing: What about pagebuilders and shortcodes? They might contain images which are not in the DB-Content. I know. With Gutenberg most of them will hopefully disappear. If we would scan the actual Output (page request) instead of the db-content we could even lower those 20% to maybe 1% (CSS Images). Of course we need to be careful of server-side performance issues. But that could work. Even it's a bit overengeneered 😄

jeffposnick commented 6 years ago

Hey All—Chiming in from the Workbox side of things.

Dynamically determining which subresources were loaded by a page is something that we're actively working on. It's not so much in the context of precaching, though—Workbox assumes that the preache manifest of URLs is determined ahead of time, during a build process. But we want to determine assets used at runtime in order to "backfill" runtime caches with requests that were made on the initial page load, prior to the service worker taking control. That would then assume that you'd use a runtime caching strategy (cache-first, stale-while-revalidate, etc.) inside the service worker for handling subsequent requests for those subresources.

There's more info on our plans for this in https://github.com/GoogleChrome/workbox/issues/368

CC: @philipwalton and @prateekbh who are both looking into implementing this in Workbox.

westonruter commented 6 years ago

Dynamically determining which subresources were loaded by a page is something that we're actively working on.

@jeffposnick Thanks, that looks interesting. It makes sense that there wouldn't be a straightforward way to discover the subresources required by a given URL without just going ahead and loading document. It's great to know that Workbox is looking to add to the cache the assets loaded on the initial page load. That seems it would potentially reduce the need for precaching a list of URLs obtained by a build process (which again isn't really feasible in this CMS context) other than the URL to the offline page itself. Again, this assumes most of the assets used on the offline page would be common with the page on which the service worker was first installed.

So you are implementing a network first caching for assets anyway?

@nico-martin That's yet to be determined, but I think it would make sense as a default. It would need to be configurable so that a theme/plugin could opt-in to staleWhileRevalidate or cacheFirst.

One other thing: What about pagebuilders and shortcodes? They might contain images which are not in the DB-Content. I know. With Gutenberg most of them will hopefully disappear. If we would scan the actual Output (page request) instead of the db-content we could even lower those 20% to maybe 1% (CSS Images). Of course we need to be careful of server-side performance issues. But that could work. Even it's a bit overengeneered 😄

Yeah, my concern is the over-engineering vs the value gained. There is the concern of knowing when to re-scrape the offline page output to determine when new assets need to be cached (e.g. from header images, sidebar widgets, etc). On the other hand, if we just rely on the page post type and are only concerned with the featured image and assets in its content then we can just rely on the save_post action to reliably re-discover any changes since they are contained within the post/page. The remaining 20% scenario could be satisfied by providing a theme/plugin API to add URLs to the list of assets that are precached.

roborourke commented 6 years ago

Don't know if it helps - I tried something similar to this a few years ago with appcache, which uses the asset queue and scans the active theme directory:

https://wordpress.org/plugins/appcachify/

Scanning the active theme & plugins for non-js/css assets covers some of the cases like assets referenced in CSS. Of course it depends on how an offline page is put together, anything dynamic/editable would be missed.

Also the scanning operation can potentially take a huge amount of time if plugins are included so unless it can be run in the background it's a non-starter. Perhaps another option to consider though.

westonruter commented 6 years ago

@roborourke Thank you for that. I think we should consider that for the future. For the time being, however, we can consider (what I assume is) the 80% case where the assets used on the homepage will also be used on the offline page, with the exception of the featured image and media in the offline page's content which we should precache along with the offline page itself. For the remaining 20% I think it would be better for now to defer to theme and plugin authors to explicitly precache the routes they know are needed, as I don't think even with parsing the response we won't be able to get 100% of the URLs that are referenced by the offline page. For one thing, there are the assets dynamically-injected by JS. The other thing is when a theme or plugin changes this will surely mean changes to the assets returned on the offline page, so we'd be needing to continually scan the offline page for assets. But since we couldn't get 100% coverage anyway, I think it is better to just leave that out for now.

postphotos commented 6 years ago

Hi @westonruter - thanks for generating this ticket and making progress! I've added a user story and two basic ACs here.

Based on the discussion, it sounds like coming up with a way to manage caches might also need to be added as an additional AC, as well as coming up with a logical way to manage sub-assets.

westonruter commented 6 years ago

I've taken a first pass at precaching based on the needs of serving an offline page in #48. This will need to be further iterated on, but at the moment it is precaching any scripts and styles that are enqueued, as well as the custom background image, custom header image, and custom logo mage. And it is precaching the offline page URL itself and its featured image. It then uses runtime caching to accumulate any image and font assets that are linked to in the theme.

westonruter commented 6 years ago

There's one more thing I want to do, and that is related to specifying the query vars: https://github.com/GoogleChrome/workbox/issues/1613#issuecomment-419267482

https://github.com/xwp/pwa-wp/blob/6cc04cde01f8881fd2b542ed84137dccc641e740/wp-includes/js/service-worker-precaching.js#L9-L13

postphotos commented 5 years ago

We've made enough progress here so I'm closing for now, though this comment is a future enhancement to make caching easier.