GoogleChrome / workbox

📦 Workbox: JavaScript libraries for Progressive Web Apps
https://developers.google.com/web/tools/workbox/
MIT License
12.39k stars 821 forks source link

Cache specific URLs #2908

Closed zeshhaan closed 3 years ago

zeshhaan commented 3 years ago

Hello! I am addressing a special case which is as follows:

The site has around 42k+ pages and are frequently updated. I am looking for a way to convert this site into PWA such that only the Home page (https://www.example.com/) and its assets are cached as staleWhileRevalidate. The rest of the pages (any occurrences of a pathname ) are served as networkOnly, and hence when the user is offline, they are shown an offline.html page.

I have checked the documentation and the closest I have come across is this Offline page only recipe where the offline page is served while offline.

const {registerRoute, NavigationRoute} = workbox.routing;
const {NetworkOnly} = workbox.strategies;
const { navigationPreload } = workbox;

const CACHE_NAME = 'offline-html';
// This assumes /offline.html is a URL for your self-contained
// (no external images or styles) offline page.
const FALLBACK_HTML_URL = '/offline.html';
// Populate the cache with the home page and offline HTML page when the
// service worker is installed.
self.addEventListener('install', async (event) => {
  event.waitUntil(
    caches.open(CACHE_NAME)
      .then((cache) => cache.addAll(['/offline.html', '/index.html']))
  );
});

navigationPreload.enable();

const networkOnly = new NetworkOnly();
const navigationHandler = async (params) => {
  try {
    // Attempt a network request.
    return await networkOnly.handle(params);
  } catch (error) {
    // If it fails, return the cached HTML.
    return caches.match(FALLBACK_HTML_URL, {
      cacheName: CACHE_NAME,
    });
  }
};

// Register this strategy to handle all navigations.
registerRoute(
  new NavigationRoute(navigationHandler)
);

I'm trying to extend this code to make an exception for the home page URL to be cached along with the offline page but haven't got any luck yet. The index.html is in cache but couldn't get away to add an exception with a StaleWhileRevalidate method.
image

Any pointers would be highly appreciated.

jeffposnick commented 3 years ago

How about this?

const {registerRoute, setDefaultHandler} = workbox.routing;
const {NetworkOnly, StaleWhileRevalidate} = workbox.strategies;
const {navigationPreload} = workbox;

// I think you ca just cache the offline content ahead of time.
// The runtime s-w-r strategy should take care getting / in the cache.
const CACHE_NAME = 'offline-html';
const FALLBACK_HTML_URL = '/offline.html';
self.addEventListener('install', (event) => {
  event.waitUntil(
    caches.open(CACHE_NAME)
      .then((cache) => cache.add(FALLBACK_HTML_URL))
  );
});

// Enable navigation preload.
navigationPreload.enable();

// Register a NetworkOnly route, which will take advanage of navigation preload
// whenever it's a navigation request for any URL other than /
// (You can add a check for `/index.html` if you also want to exclude that.)
registerRoute(
  ({request, url}) => request.mode === 'navigate' && url.pathname !== '/',
  new NetworkOnly({
    plugins: [{
      // See https://developers.google.com/web/tools/workbox/guides/using-plugins
      handlerDidError: () => caches.match(FALLBACK_HTML_URL),
    }],
  })
);

// Optional: you didn't mention it, but you might want an additional route here
// to cache large assets, like images, with a policy other than the default one.

// For all other requests, i.e. assets/subresources or navigations to /,
// use StaleWhileRevalidate.
setDefaultHandler(new StaleWhileRevalidate());
zeshhaan commented 3 years ago

Thanks so much! I have got it working partially except i found 2 issues.

This is my (very slightly) modified code from your original reply. I have planned to add a match function as the first argument inside setDefaultHandler that will match only home page to be cached but don't know how to scope it's sub-resources. Can you share any tips on handling this case? Thanks again!

const { registerRoute, setDefaultHandler } = workbox.routing;
const { NetworkOnly, StaleWhileRevalidate } = workbox.strategies;
const { navigationPreload } = workbox;

const CACHE_NAME = "offline-html";
const FALLBACK_HTML_URL = "/fallback.html";
const HOME_URL = "/s/v2/homev1";
//this is the temporary home page for now and not the root directory
self.addEventListener("install", (event) => {
  event.waitUntil(
    caches.open(CACHE_NAME).then((cache) => cache.add(FALLBACK_HTML_URL))
  );
});

navigationPreload.enable();

registerRoute(
  ({ request, url }) => request.mode == "navigate" && url.pathname != HOME_URL,
  new NetworkOnly({
    plugins: [
      {
        handlerDidError: () => caches.match(FALLBACK_HTML_URL),
      },
    ],
  })
);

setDefaultHandler(new StaleWhileRevalidate());
jeffposnick commented 3 years ago

setDefaultHandler() registers a handler that will be used by default when there is no matching route, so it doesn't support passing in a matchCallback. Think of it as a last-chance fallback strategy for anything that isn't matched via one of your registerRoute()s.

It sounds like you want to add in one or more additional calls to registerRoute() with the a combination of match criteria and strategy. If all of your requests are explicitly handled by a registered route, you could remove the setDefaultHandler() entirely. I just added it in because it sounded like it would work based on your initial description.

In any case, I'm glad that you have something that should work for you!

zeshhaan commented 3 years ago

Gotcha!

Here is what i'm struggling with:

Here is what i have tried so far:

Here is what would be helpful to know:

So in brief, i would like to use workbox to cache specific route as a whole instead of resource types such as markup, image, styles, scripts, etc. is that something that is possible with Workbox?

Something like: Routes to cache - home, fallback strategy for home route: stale-while-revalidate strategy for fallback route cache-only other routes - network-only

where caching sub-resources is understood and set as default.

jeffposnick commented 3 years ago

The standard way of saying "hey, this subresource request is associated with this page" is via the Referer header. If you only care about caching subresource requests that are associated with a given page, something like this should allow you to create a route that will only match those requests:

// Both the Referer header and request.url will be set to a full URL,
// so make sure we're matching against that.
const MY_URL_STRING = (new URL('/index.html', location)).toString();

registerRoute(
  // Match a request for a URL, or subresource request originating from that URL.
  ({request}) => request.url === MY_URL_STRING ||
                 request.headers.get('Referer') === MY_URL_STRING,
  // Put whatever strategy, plugins, etc. you want here.
  new StrategyName()
);

Hopefully that puts you on the right track!

jeffposnick commented 3 years ago

(Oh, and if any of theses are cross-origin requests, then this information is important, as by default, the full URL may not be available via the Referer: header: https://developers.google.com/web/updates/2020/07/referrer-policy-new-chrome-default)

zeshhaan commented 3 years ago

Thanks, Referer was something new to learn. The resources were all served from an external CDN however there are styles and scripts, and some images which are served from same origin but still - only the markup is getting cached.

I have spinned my local testing folder to a repo, you can see the code on this link if you would like to test https://github.com/zeshhaan/jekyll-workbox/blob/master/sw.js

As mentioned in this article - web.dev/referrer-best-practices i added a site wide meta tag to explicitly set the referrer policy - <meta name="referrer" content="no-referrer-when-downgrade" /> But still then only markup is getting cached. Am i missing anything here?