GoogleChrome / workbox

📦 Workbox: JavaScript libraries for Progressive Web Apps
https://developers.google.com/web/tools/workbox/
MIT License
12.35k stars 816 forks source link

Option to precache multiple requests simultaneously #2880

Open budarin opened 3 years ago

budarin commented 3 years ago

If I always have to wait for the full installation of the service worker before the page is displayed, why not to tell the service worker that assets can be loaded not sequentially, but as quickly as possible-in parallel?

On a slow connection, we have to wait a very long time for a sequential download, even on http2

My case (slow 3g connection): 13 page assets are loaded in parallel in 5 seconds and then 22 assets are loaded with service worker sequentially for about a minute over http2 (13 of them are the same ones that loaded in 5 seconds for the page and the main size of all assets falls on them)

jeffposnick commented 3 years ago

You can read some background on this new behavior introduced in Workbox v6 at https://github.com/GoogleChrome/workbox/issues/2528

Basically, we went from not imposing a limit at all—which would sometimes incur net::ERR_INSUFFICIENT_RESOURCES errors, and more often, would lead to critical resources loaded by the main page loading slowly when the service worker was registered too early—to a model in which precaching requests are made one at a time.

I still believe that's the "safer" thing to do overall, and my general view of precaching requests is that they should be considered low-priority things that happen in the background. But there are use cases in which folks really do want more aggressive precaching to take place, which you and a few other folks have now asked for.

I'm open to making this configurable in a future release, with the default still being one at a time.

budarin commented 3 years ago

Our application is completely dependent on the service worker and therefore we have to wait for its full installation.

If the constructor of the service worker had an option that allows us to configure the loading of the service worker in parallel when it is necessary - this would save our application from a long 1st start as well as starts in iOS (it is famous for periodically completely deleting the application)

jeffposnick commented 3 years ago

By the way, could you explain a bit more why your web application can't do anything until after the service worker has fully installed? Service workers are normally intended to be progressive enhancement, and (if possible) it's usually best to treat them as such instead of as a hard requirement.

budarin commented 3 years ago

Our application protection against CSRF attacks is based on a Double Submit Cookie when a new token is issued to the client for each session in response header and session cookie.

Due to the fact that Google has made the so - called "session" cookies (cookies without expiration date) eternal in the "Automatic session recovery" mode - the whole idea of protecting against CSRF attacks becomes vulnerable-cookies are eternal and the attacker has a big chance to select a token knowing approximately the algorithm for its creation.

RFC: If a cookie has neither the Max-Age nor the Expires attribute, the user agent will retain the cookie until "the current session is over" (as defined by the user agent).

( Caution! Be afraid of sites with authorization with the "Don't remember me" option selected - you will be logged in to the browser with the "Automatically restore session" option enabled forever - closing the browser will not delete the authorisation session cookie )

In order to solve this problem with the violation of the RFC by Google, we wrote a route handler that looks at all POST requests and puts a token in the CSRF header that the service worker receives at the beginning of the session of the service worker and the server now has the ability to change the token not only for the session, but also for each request without problems with the Back button and without problems for the site open in different tabs and frees the client completely about the protection code

Therefore, our application cannot work correctly without installing and starting the service worker

budarin commented 3 years ago

While I was writing the answer , I came up with an idea - can I not pre-cache the resources before installing the service worker, but start post install precaching ?

can I do this with workbox? this would solve my problem of a long start

But the question of the possibility of parallel loading of assets into the cache is still relevant for other cases :)

jeffposnick commented 3 years ago

"The Offline Cookbook" offers an overview of various ways a service worker can be used to cache and keep content up to date. Roughly speaking, each of the concepts in that cookbook map into something you can do with a combination of workbox-routing and workbox-strategies, or workbox-precaching.

It's definitely legitimate to create a service worker that doesn't precache during installation. Normally, you'd "cache as you go" via runtime caching, instead of caching things upfront. When taking this approach, you'd need to think about what kind of freshness guarantees you need for your HTML and subresources, and choose an appropriate strategy (NetworkFirst, StaleWhileRevalidate, etc.) to use for each runtime route. Things get more complicated if you have sets of assets that all need to be expired or refreshed together, atomicly—that's the sort of thing that workbox-precaching is optimized for, but requires more manual intervention when you're using dynamic runtime caching.

I would not describe any of this as "post-install precaching" though. It's more cache-as-you-go runtime caching. If you have a set of assets that all need to be added to a cache before you service worker activates, then using precaching (which has to happen during install) is really your best bet.

budarin commented 3 years ago

I'm sorry, I'm not very clear - what are the reasons not to do the same precaching after activating the service worker? why can't we use the same algorithm that is used during the installation, after activating the service worker?

using createHandlerBoundToURL, for a example, involves using a PrecacheController and a precache cache, which will be missing if your recommendation to use a runtime cache is used

I believe that there are a big number of applications in which the primary one is the early activation of the service worker and its entry into operation even with the initial installation of the service worker. Why not make it possible for such applications to pre-cache assets after activating the service worker with workbox?

jeffposnick commented 3 years ago

The upgrade flow for workbox-precaching—where you have an atomic set of previously cached URLs, and you migrate to a new set, with one or more of those cached URLs updated—relies heavily on the service worker lifecycle. Specifically, we need to cache new entries during install, perform cleanup of no longer needed entries during activate, and use the transition from post-install waiting to active to allow open applications to opt-in to updating to the latest set of precached assets.

If we decouple those operations from the service worker lifecycle, it would break a number of common use cases. For instance, single page apps that lazily load their subresources would end up loading the HTML from the "old" precache, and then may end up lazily-loading their subresources from the "new" precache. We would move from a model with well-defined cache consistency to one with inherent race conditions. (This discussion on why you shouldn't unconditionally call skipWaiting() covers the same material.)

I've been helping developers with service workers since they first launched in Chrome, and I can say with confidence that there are not a large number of applications that rely on early control the service worker. We strongly encourage developers to treat the presence of a service worker as optional when it comes to app functionality, and it's trivial for users to shift-reload a page, which will trigger a navigation without the service worker being in control.

I really don't know enough about your web app's security constraints to speak to what you're doing for CSRF protection, but I can tell you it's not a common use case for service workers.

budarin commented 3 years ago

time passes and if earlier the worker service was more like a kind of feature for decoration, today it is a working tool and is already not optional for many applications, but mandatory.

I know a lot of developments where increased confidentiality and security are required (document management with digital signatures, crypto tokens, etc.), all of them actively use the service worker. But since they are mostly used on desktops , the issue with a long initial installation is not as acute for them as for me.

I understand that it is difficult to detach the logic of the precache module from the lifecycle, but the implementation of the functionality for monitoring and updating assets for new versions in my situation will largely duplicate the logic laid down in the precache module

jeffposnick commented 3 years ago

We don't have any plans to change the workbox-precaching behavior to detach it from the service worker install and activate events. Those service worker events and the service worker lifecycle in general were designed to be used in the way Workbox uses them.

Allowing developers to customize the number of simultaneous precaching requests that are in flight at any given time is still something we will consider for a future release, and I'll leave this issue open to track that.

budarin commented 3 years ago

based on your experience, can you suggest an algorithm for supporting pre-caching assets for service worker versions in the runtime cache?

jeffposnick commented 3 years ago

It wouldn't be called precaching.

You can set up runtime caching routes to match whatever assets you want to be added to the cache, along the lines of what's described at https://developers.google.com/web/tools/workbox/guides/get-started#routing_and_caching_strategies

lajuffermans commented 2 years ago

+1 for making this configurable, these days simultaneous requests can improve loading assets a lot to be honest.

budarin commented 2 years ago

+1 request - is another +1 potential point of failure and +1 another request that can increase the loading time of the site

Mister-Hope commented 2 years ago

Any update about this, or can you add some simple explaination how this can be changed in V6?

For me, my situation is that I want to provide a offline docs for users, but I have every small asssets generated with hash name by default, and currently a 1 by 1 fetch offen takes more then 3 minute to install a new service worker, even before the user finish searching and reading then quit. The only request came from user maybe the search api, so it's totally fine for me to parallel requests, e.g.: 10 to 20.

It would be great if anyone points out some workaround for me to achieve this, thanks.


Or in another word, I think it's fine for an offine PWA want to update itself as soon as possiable.

akhilgrover commented 2 years ago

Can the code for Install be updated to below so that there are 10 requests in parallel. const fetchQueueDepth = 10; can be configured as a parameter.

install(event) {

    function MakeQuerablePromise(promise) {
        // Don't modify any promise that has been already modified.
        if (promise.isFulfilled) return promise;
        // Set initial state
        var isPending = true;
        var isRejected = false;
        var isFulfilled = false;

        // Observe the promise, saving the fulfillment in a closure scope.
        var result = promise.then(
            function(v) {
                isFulfilled = true;
                isPending = false;
                return v;
            },
            function(e) {
                isRejected = true;
                isPending = false;
                throw e;
            }
        );
        result.isFulfilled = function() { return isFulfilled; };
        result.isPending = function() { return isPending; };
        result.isRejected = function() { return isRejected; };
        return result;
    }
    // waitUntil returns Promise<any>
    // eslint-disable-next-line @typescript-eslint/no-unsafe-return
    return waitUntil(event, async () => {
        const installReportPlugin = new PrecacheInstallReportPlugin();
        this.strategy.plugins.push(installReportPlugin);
        var promises = [];
        // Cache entries 10 at a time.
        const fetchQueueDepth = 10;
        // See https://github.com/GoogleChrome/workbox/issues/2528
        for (const [url, cacheKey] of this._urlsToCacheKeys) {
            const integrity = this._cacheKeysToIntegrities.get(cacheKey);
            const cacheMode = this._urlsToCacheModes.get(url);
            const request = new Request(url, {
                integrity,
                cache: cacheMode,
                credentials: 'same-origin',
            });
            const result = Promise.all(this.strategy.handleAll({
                params: { cacheKey },
                request,
                event,
            }));
            promises.push(MakeQuerablePromise(result));
            if (promises.length >= fetchQueueDepth) {
                await Promise.any(promises);
            }
            promises = promises.filter(function(value){
                return value.isPending();
            });
        }
        const { updatedURLs, notUpdatedURLs } = installReportPlugin;
        if (process.env.NODE_ENV !== 'production') {
            printInstallDetails(updatedURLs, notUpdatedURLs);
        }
        return { updatedURLs, notUpdatedURLs };
    });
}
budarin commented 2 years ago

at least for http2, you can safely download multiple assets simultaneously

budarin commented 2 years ago

and if you also use a lower priority for these requests, there are no barriers at all for parallel loading of all assets at once

csvan commented 2 years ago

I strongly second this as opt-in behaviour.

@jeffposnick's argument as to why it works as it does is sound, but there are still many edge cases where allowing parallel downloads is appropriate - it would be great if Workbox simply allowed to toggle this as an option, with the default behaviour still being sequential downloads.

lassejaco commented 2 years ago

We're precaching a huge amount of very small requests for some dynamically generated pages (few kbs each) and this basically bricks our install experience. Just showing my love for this feature request.

AlexRMU commented 1 year ago

So far, I think it's possible to solve this using injectManifest and a similar file:

// workbox
import { cacheNames, clientsClaim } from "workbox-core";
import { registerRoute } from "workbox-routing";
import { precacheAndRoute } from "workbox-precaching";

precacheAndRoute(/* ... */); //basic and important files
registerRoute(/* ... */);

self.skipWaiting();
clientsClaim();

// files
import async from "async";
let files = ["http://google.com/", "http://google.com/"] //a lot of links to other files (which are not in the precache), can be taken from a string or file
let concurrency = 10;
async function main() {
    let cache = await caches.open(cacheNames.runtime);
    await async.mapLimit(files, concurrency, (x) => cache.add(x));
}
main();

These files will be added to the cache, if they are not there, during installation. If the installation is interrupted, the remaining part will be added at the next startup.

budarin commented 1 year ago

So far, I think it's possible to solve this using injectManifest and a similar file:

will there be any competition when downloading between your code and the workbox code? workbox at the start of the service worker will also start downloading the same files!!

AlexRMU commented 1 year ago

will there be any competition when downloading between your code and the workbox code? workbox at the start of the service worker will also start downloading the same files!!

Important and necessary files are added to the precacheAndRoute. All the others are added manually, and the workbox just serves them with route.

renhiyama commented 1 year ago

Been more than a year now, any updates on this? I'm making a vscode alternative which works on mobile, and is meant to work offline, and since there are like over 1k fileicons, downloading them serially takes more than 2 minutes, thus making the installation button come a lot after 🥲. The icon sizes are ranging from 1-3 kb so it shouldn't be a problem downloading them in parallel. Out users are asked to provide enough bandwidth for the first time load/installation of the app in order to work offline. (Less than 10mb though 🙃) hopefully you guys provide an inbuilt way to change requests to parallel since I'm using next pwa package, which depends on workbox and I would prefer to not have to edit the sw.js file that is generated during build time to inject the above code posted by Alex. Thanks in advance.

JayPe69 commented 1 year ago

Any updates ? On my side, my cypress tests crash because to load 20 precache I need to wait for 10 secondes.
After those 10 secondes, the SW reload the page in the middle of my cypress tests.. Adding a 10 seconds wait for each test is not a solution.

So it could be very very nice to parallelize requests. Or maybe have an option to avoid SW reload after precaching files.

csvan commented 1 year ago

@JayPe69 a forced reload is not standard SW behavior and is to my knowledge not induced by Workbox either.

JayPe69 commented 1 year ago

@JayPe69 a forced reload is not standard SW behavior and is to my knowledge not induced by Workbox either.

Oh, I got this reload, when I added precache functionality, and I tried to locate if it comes from my SW code or not.
I'll double check.

EDIT 1:

I found it https://developer.chrome.com/docs/workbox/handling-service-worker-updates/

Once activated, the new service worker will take control of any existing clients, triggering the controlling event in workbox-window. When this happens, the current page reloads using the latest version of all the precached assets and any new routing logic found in the updated service worker.

EDIT 2:

The solution to avoid reloading when install the SW for the first time, is to check the controlling event value : isUpdate. Like this you can reload only when there is a new SW activated, but not when installing it for the first time. It solves my cypress problem.

Thanks @csvan , my first check wasn't enough accurate as the reload was removing my log ... (keep journal logs option disabled ... :s)

However, It could be a nice option to add to let the user choose the way precache files should load.

KaelWD commented 1 year ago

I applied this with patch-package, now 500 assets are cached in 3 seconds instead of 4.5 minutes.

diff --git a/node_modules/workbox-precaching/PrecacheController.js b/node_modules/workbox-precaching/PrecacheController.js
index e00975e..7380049 100644
--- a/node_modules/workbox-precaching/PrecacheController.js
+++ b/node_modules/workbox-precaching/PrecacheController.js
@@ -5,6 +5,7 @@
   license that can be found in the LICENSE file or at
   https://opensource.org/licenses/MIT.
 */
+import eachLimit from 'async-es/eachLimit'
 import { assert } from 'workbox-core/_private/assert.js';
 import { cacheNames } from 'workbox-core/_private/cacheNames.js';
 import { logger } from 'workbox-core/_private/logger.js';
@@ -150,9 +151,8 @@ class PrecacheController {
         return waitUntil(event, async () => {
             const installReportPlugin = new PrecacheInstallReportPlugin();
             this.strategy.plugins.push(installReportPlugin);
-            // Cache entries one at a time.
             // See https://github.com/GoogleChrome/workbox/issues/2528
-            for (const [url, cacheKey] of this._urlsToCacheKeys) {
+            await eachLimit(this._urlsToCacheKeys, 10, async ([url, cacheKey]) => {
                 const integrity = this._cacheKeysToIntegrities.get(cacheKey);
                 const cacheMode = this._urlsToCacheModes.get(url);
                 const request = new Request(url, {
@@ -165,7 +165,7 @@ class PrecacheController {
                     request,
                     event,
                 }));
-            }
+            })
             const { updatedURLs, notUpdatedURLs } = installReportPlugin;
             if (process.env.NODE_ENV !== 'production') {
                 printInstallDetails(updatedURLs, notUpdatedURLs);

you'll have to add async-es to your own package.json

tomayac commented 6 months ago

Hi there,

Workbox is moving to a new engineering team within Google. As part of this move, we're declaring a partial bug bankruptcy to allow the new team to start fresh. We realize this isn't optimal, but realistically, this is the only way we see it working. For transparency, here're the criteria we applied:

Thanks, and we hope for your understanding! The Workbox team

tomayac commented 6 months ago

Reopening this issue, as it has an active PR associated with it. Sorry.