cloudflare / workerd

The JavaScript / Wasm runtime that powers Cloudflare Workers
https://blog.cloudflare.com/workerd-open-source-workers-runtime/
Apache License 2.0
6.25k stars 300 forks source link

🐛 BUG: Workers TTFB slows down based on unused dynamic imports (additional modules) #2372

Open juanferreras opened 7 months ago

juanferreras commented 7 months ago

Which Cloudflare product(s) does this pertain to?

Pages, Workers Runtime, Wrangler core

What version(s) of the tool(s) are you using?

wrangler 3.38.0

What version of Node are you using?

18.18.2

What operating system and version are you using?

macOS Sonoma 14.2.1

Describe the Bug

Observed behavior

Additional modules that are used as dynamic imports and not bundled will still slow down Workers TTFB if they're uploaded, even if they are never effectively dynamically imported. This can be replicated on both Workers and Pages (with _worker.js/ dir), and whilst it affects both, it seems quite worse on Pages.

Expected behavior

Without knowing any internal infrastructure limits, ideally dynamic imports would be lazily instantiated/parsed/exec in a way that mitigates worsening performance on the Worker as a whole except when necessary.

This is a pattern that most JS frameworks deployed to Cloudflare Pages are relying on (see official Next.js adapter next on pages, Nuxt, Remix, etc).

Steps to reproduce

You can clone both an example using Worker with --no-bundle and an example using Pages with _worker.js/ dir below. In both cases, you can see the previous commits, but just by modifying the files you include inside the lazy folder you can ultimately impact the performance of the Worker when deployed (couldn't measure any differences locally).

Please provide a link to a minimal reproduction

See the README.md for instructions and sizes of each of the files. The setup is the exact same on both projects, we have a simple Worker that dynamically imports .js files with a different volumes of data.

The path we're using to compare is always NOT dynamically importing the heavy dependencies but the change is whether those files are simply included as additional modules when deploying or not.

Please provide any relevant error logs

N/A

Benchmarks

Using WebPageTest Virginia - EC2 Motorola G Power with 4G, running 9 tests for each (first view only) and taking into account the median.

Pages with _worker.js/ dir

The following 4 branches have been deployed.

  1. https://small.dynamic-import-cold-start-pages.pages.dev/ - WebPageTest Link
  2. https://medium.dynamic-import-cold-start-pages.pages.dev/ - WebPageTest Link
  3. https://large.dynamic-import-cold-start-pages.pages.dev/ - WebPageTest Link
  4. https://titanic.dynamic-import-cold-start-pages.pages.dev/ - WebPageTest Link

Comment: the impact is a very noticeable even with just adding medium (e.g. way before totalling 1MB gzipped code). It continues growing linearly (note the titanic version is still only using 3 MB gzipped out of a total of 10 MB in theory allowed for the platform when using paid plans).

Worker with --no-bundle

The following 4 workers have been deployed.

  1. https://dynamic-import-cold-start-workers.juanmf.workers.dev/ - WebPageTest link
  2. https://dynamic-import-cold-start-workers-medium.juanmf.workers.dev/ - WebPageTest link
  3. https://dynamic-import-cold-start-workers-large.juanmf.workers.dev/ - WebPageTest link
  4. https://dynamic-import-cold-start-workers-titanic.juanmf.workers.dev/ - WebPageTest link

Comment: it seems a lot less directly impacted, when adding medium it does not change, and when adding the larger ones it's affected but less than the Pages ones (e.g. like log(n) instead of linearly).


Please do let me know if I have any errors on the repositories and/or if there are ideas worth trying to see if (a) we could get the Pages one to perform similar to the Worker, and/or (b) we can do dynamic imports slightly different to prevent the overall performance impact

penalosa commented 4 months ago

Thanks for reporting this, and the detailed investigation! This is currently expected behaviour, as there's a slight additional startup cost associated with additional modules even if they're not imported. We have some upcoming work that may improve this situation cc @jasnell

juanferreras commented 4 months ago

Thanks for reporting this, and the detailed investigation! This is currently expected behaviour, as there's a slight additional startup cost associated with additional modules even if they're not imported. We have some upcoming work that may improve this situation cc @jasnell

Thanks for your response, @penalosa! I've been following along James' new module registry implementation which seems to lazy resolve/compile/eval on first import. It's exciting to think of how it'll unlock much larger apps to be deployed in Workers whilst still maintaining great end-user perf!

If both of you wouldn't mind keeping this issue open and eventually letting us know here when ever there's a compat flag or specific instructions to use the new jsg registry – I'm more than happy to update all benchmarks anytime.

As a side note, I was quite surprised to see the difference between Pages and Workers. We might still have other factors contributing negatively on how Pages modules get bundled/deployed compared to Workers, but I wasn't able to explain this from the artifacts/debugging of wrangler alone (or to confirm whether my comparison is inaccurate anyhow)

jasnell commented 4 months ago

Finding the time to work on the new impl and complete it has been difficult but I am hoping to get it done soon. Will keep this open and will try to give updates as progress is made

atinux commented 4 weeks ago

I can confirm that we are suffering from on the NuxtHub Admin, a big full-stack Nuxt application running on Cloudflare Pages.

We leverage code-splitting and only load the needed chunks, so this API endpoint should be fast: https://admin.hub.nuxt.com/api/ping

The final size for the _worker.js/

Σ Total size: 4.85 MB (1.43 MB gzip)

It's about ~80ms locally when testing to fetch the /api/ping endpoint using hyperfine (I am emulating a Request that I send to my worker): CleanShot 2024-10-11 at 12 04 44@2x

But in production, more like ~300ms to answer (using https://tools.keycdn.com/performance?url=https://admin.hub.nuxt.com/api/ping)

Dario was kind enough to show me that the worker in CF takes around 200ms to load: CleanShot 2024-10-11 at 12 03 52@2x

Did you have a chance to work on this @jasnell ? 🙏