facebook / docusaurus

Easy to maintain open source documentation websites.
https://docusaurus.io
MIT License
55.6k stars 8.34k forks source link

Performance - Reduce build time and memory usage - use alternative webpack JS loaders #4765

Open slorber opened 3 years ago

slorber commented 3 years ago

💥 Proposal

With Webpack 5 support, re-build times are now faster.

But we still need to improve the time for the first build which is not so good currently.

Some tools to explore:

It will be hard to decouple Docusaurus totally from Webpack at this point.

But we should at least provide a way for users to use an alternative (non-Babel) JS loader that could be faster and good enough. Docusaurus core should be able to provide a few alternate loaders that would work by default using the theme classic, by just switching a config flag.

If successful and faster, we could make one of those alternate loader the default loader for new sites (when no custom babel config is found in the project).

Existing PR by @SamChou19815 for esbuild: https://github.com/facebook/docusaurus/pull/4532

SubJunk commented 8 months ago

please let us know if that works for your use-case.

Using esbuild-loader or swc-loader as described here https://github.com/facebook/docusaurus/issues/4765#issuecomment-841135926 does not change my build time noticeably, it takes 12-14s per language regardless. For my small website https://github.com/UniversalMediaServer/knowledge-base a production build on Cloudflare takes 19m 26s total:

image

One thing I notice while watching the output logs, is that most of the time is spent re-compiling the dependencies over and over, for each language:

image

so for my use case - many languages but not many pages - it would greatly reduce build time if that dependency step could be reused after the first language is done. That would probably reduce my time by 90%+

slorber commented 8 months ago

@SubJunk you seem to have a lot of languages, wouldn't it be better to have one deployment per locale and parallelize a bit more instead of deploying sequentially?

You can choose your deployment strategy according to your scalability needs.

One thing I notice while watching the output logs, is that most of the time is spent re-compiling the dependencies over and over, for each language: image

so for my use case - many languages but not many pages - it would greatly reduce build time if that dependency step could be reused after the first language is done. That would probably reduce my time by 90%+

I plan to pre-transpile our official theme code so that we can skip it during site builds, but it probably requires creating some build tooling first to make it manageable.

Note that your mdx docs are also compiled with MDX, and this takes time, and here all locales will have a different input.

Webpack has a persistent caching system, and if you cache node_modules/.cache across CI runs it could speed up your build.

However I wish a part of that cache was shared across multiple locales but last time I investigated webpack does not permit that. (https://github.com/webpack/webpack/issues/13034)

mapachurro commented 7 months ago

Hello all!

This is the Thread I've Been Looking For.

I'm in the process of trying to get our site live in production; unfortunately repo is still behind the company GitHub org, but: we've got seventeen locales, a total of ~1400 articles, on Docusaurus V3, and we've been hitting the GH Actions / Pages RAM issue hard.

I've tried a few of the fixes mentioned here and elsewhere, and it looks like the next step is webpack.

Is there, at this point, an off-the-shelf process for switching to webpack, or do I need to engineer a custom solution, as above?

Apologies if this is documented somewhere and I missed it.

PrivatePuffin commented 7 months ago

We over at TrueCharts have it set up for a few thousand of pages, using github actions. (github.com/truecharts)

societymartingale commented 6 months ago

Switching to swc-loader reduced my Docusaurus build time by about 20%. This is for a site with 600 mdx files. Build time is 2.5 minutes on an Apple M3 pro and 4.5 minutes in a GitLab CI docker-based build.

damageboy commented 5 months ago

@societymartingale do you have an example for how this was done with docusarus?

johnnyreilly commented 5 months ago

https://johnnyreilly.com/faster-docusaurus-build-swc-loader

damageboy commented 5 months ago

@johnnyreilly I've just literally found it minutes before you posted, tried and unfortunately for me, it doesn't seem to end up with faster builds...

I'm building 3 different docs sites in one build, on a mac M3 pro. I've disabled for each of these:

        showLastUpdateAuthor: false,
        showLastUpdateTime: false,

To skip anything that isn't pure building.

On top of that, I've also discovered that it's pretty important to exclude on macOs the build / project folders from spotlight if you don't want to spend most of the CPU time indexing them during the build...

Summary:

Times Without SWC With SWC
Wall 231.47s 243.10s
User 12.44 12.12s
System 223% 229%
Total 1:49.22 1:51.32

Paper trail

Without swc-loader:

❯ yarn clear; time yarn build
yarn run v1.22.22
$ docusaurus clear
[SUCCESS] Removed the Webpack persistent cache folder at "/Users/dmg/projects/docu3/node_modules/.cache".
[SUCCESS] Removed the generated folder at "/Users/dmg/projects/docu3/.docusaurus".
[SUCCESS] Removed the build output folder at "/Users/dmg/projects/docu3/build".
✨  Done in 1.13s.
yarn run v1.22.22
$ docusaurus build
[INFO] [en] Creating an optimized production build...

✔ Client

✔ Server
  Compiled successfully in 1.68m
✨  Done in 109.14s.
yarn build  231.47s user 12.44s system 223% cpu 1:49.22 total

With swc-loader

❯ yarn clear; time yarn build
yarn run v1.22.22
$ docusaurus clear
[SUCCESS] Removed the Webpack persistent cache folder at "/Users/dmg/projects/docu3/node_modules/.cache".
[SUCCESS] Removed the generated folder at "/Users/dmg/projects/docu3/.docusaurus".
[SUCCESS] Removed the build output folder at "/Users/dmg/projects/docu3/build".
✨  Done in 1.09s.
yarn run v1.22.22
$ docusaurus build
[INFO] [en] Creating an optimized production build...

✔ Client

✔ Server
  Compiled successfully in 1.72m
[SUCCESS] Generated static files in "build".
[INFO] Use `npm run serve` command to test your build locally.
✨  Done in 111.24s.
yarn build  243.10s user 12.12s system 229% cpu 1:51.32 total
PrivatePuffin commented 5 months ago

YOu can also get a decent performance improvement by not processing .md files as .mdx

damageboy commented 5 months ago

You can also get a decent performance improvement by not processing .md files as .mdx

Tips on how to get this done?

Are you referring to this

slorber commented 5 months ago

YOu can also get a decent performance improvement by not processing .md files as .mdx

I'm also curious to know what you mean, maybe you are right but my intuition is that it does not have a significant impact since in both cases content is going to be compiled to React components.


Note, I'm actively working on perf optimizations for Docusaurus v3.2.

There are no breaking changes in the 3.x branch yet so if you can run a canary version of Docusaurus and let me know how it improves, I'd be curious to know how faster it is on your site. https://docusaurus.io/community/canary

The last remaining bottleneck remains the Webpack compilation time, I'll look into that soon.

societymartingale commented 5 months ago

@damageboy, I added the following to package.json:

"@swc/core": "^1.4.6",
"swc-loader": "^0.2.6"

And added following to docusaurus.config.js:

webpack: {
    jsLoader: (isServer) => ({
      loader: require.resolve("swc-loader"),
      options: {
        jsc: {
          parser: {
            syntax: "typescript",
            tsx: true,
          },
          target: "es2019",
          transform: {
            react: {
              runtime: "automatic",
            },
          },
        },
        module: {
          type: isServer ? "commonjs" : "es6",
        },
      },
    }),
  },

As indicated above, I also disabled the last update author/time feature, as it wasn't scaling well. This saved another minute or two.

showLastUpdateAuthor: false,
showLastUpdateTime: false
slorber commented 5 months ago

I also disabled the last update author/time feature, as it wasn't scaling well.

This major perf issue is fixed in canary (https://github.com/facebook/docusaurus/pull/9890) and will be released in v3.2

andrewgbell commented 5 months ago

3.2 for us has seen a major improvement in build times (thanks @slorber !). Running on Github actions, 4 core 16gb linux runner, we have the following

3.1 build time - 35 mins

3.2 build time - 23 mins 3.2 build time (esbuild) - 27 mins 3.2 build time (swc) - 21 mins

Steveantor commented 5 months ago

Why not ditch webpack for esbuild like vite? It's faster and better in every possible way.

slorber commented 5 months ago

@Steveantor it's easier said than done.

First of all, our plugin system has a configureWebpack hook, which means migrating away from Webpack would break most of our ecosystem that would need to update and provide a version compatibility matrix to their user. We also must ensure that an upgrade path is possible for plugin authors under the new bundler, at least for all major plugins.

I'm likely to adopt an incremental migration path thanks to unplugin, which is a more portable abstraction for bundler plugins that would probably be good enough for most Docusaurus plugins that only need a loader.

Also, Vite is only using esbuild in some parts, and uses Rollup for bundling (v4 is based on SWC). And they are also working on Rolldown, a Rust port of Rollup.

Due to how Docusaurus works, and our plugin system, it does not look like a good idea to use different bundling in dev/build modes.

Afaik esbuild supports live reloading but has no official for hot reloading. We don't want to refresh the browser when you edit JS or MD, this would make your page lose its state.

There's also Rspack which aims to be 100% retro-compatible with Webpack and according to this Bun benchmark, it's already quite faster than Webpack.

https://bun.sh/blog/bun-bundler

image

Vercel is also actively working on Turbopack, which also aims to be "mostly" compatible with Webpack, but less than Webpack.

So Rspack is for me the most suitable candidate in the short term, due to the constraints we have, and until other solutions become more mature. I'm likely to introduce "future flags" in Docusaurus and let you swap Webpack with Rspack. It might not work for all third-party plugins (yet), but it should improve over time as Rspack fills the gap.


Note that bundling is a major bottleneck (mostly for "cold builds" with an empty Webpack cache, less for rebuilds) but is not the only performance issue we have in Docusaurus. I fixed some in v3.2 but I have ideas to improve other parts as well.

Notably, I'm not sure the high memory consumption is related to the bundling phase, but rather the SSG phase.

slorber commented 5 months ago

FYI in v3.2 I added some site build perf logging (https://github.com/facebook/docusaurus/pull/9975).

This is considered an internal API for now, but if you are curious to see what takes time on your site you can run your site with DOCUSAURUS_PERF_LOGGER=true, and get this kind of output:

CleanShot 2024-04-19 at 11 53 01@2x

snake-py commented 4 months ago

So I am also running into this issue, but my site is really small. I only have about three pages right now. It seems that the main issue is with CssMinimizerPlugin for me. image

Is it possible to disable the minimizer?

slorber commented 4 months ago

You can try running with USE_SIMPLE_CSS_MINIFIER=true docusaurus build and see if it improves.

Romej commented 1 month ago

I would love to use docusaurus with rspack, we had many legacy CRA projects and we successfully migrated all of them to rspack without much efforts.

with rspack 1.0 around the corner, i am hoping this will be an option soon

clainchoupi commented 2 weeks ago

That may be a noob question, but how can I set the DOCUSAURUS_PERF_LOGGER variable ? I wanted to try it locally, so I tweaked a little my "npm run" commands but it does not work ^^

OzakIOne commented 2 weeks ago

That may be a noob question, but how can I set the DOCUSAURUS_PERF_LOGGER variable ? I wanted to try it locally, so I tweaked a little my "npm run" commands but it does not work ^^

@clainchoupi I think it should be something like Unix: DOCUSAURUS_PERF_LOGGER=true npm run start PowerShell: $env:MY_VAR="value"; npm run your-command CMD: set MY_VAR=value && npm run your-command

clainchoupi commented 1 week ago

I tried the following commands but none of them worked :P

PS D:\WORK\TECH\docusaurus-interne> --DOCUSAURUS_PERF_LOGGER=true npm run start
==> DOCUSAURUS_PERF_LOGGER=true : Le terme «DOCUSAURUS_PERF_LOGGER=true» n'est pas reconnu comme nom d'applet de commande, fonction, fichier de      
script ou programme exécutable

PS D:\WORK\TECH\docusaurus-interne> DOCUSAURUS_PERF_LOGGER=true npm run start
==> Au caractère Ligne:1 : 3  --DOCUSAURUS_PERF_LOGGER=true npm run start   ~ Expression manquante après l'opérateur unaire « -- ».

PS D:\WORK\TECH\docusaurus-interne> npm run docusaurus build --DOCUSAURUS_PERF_LOGGER true
==> Build ok mais n'affiche pas les logs

PS D:\WORK\TECH\docusaurus-interne> npm run docusaurus build DOCUSAURUS_PERF_LOGGER true
==> [ERROR] [Error: ENOENT: no such file or directory, lstat 'D:\WORK\TECH\docusaurus-interne\DOCUSAURUS_PERF_LOGGER'] {

PS D:\WORK\TECH\docusaurus-interne> npm run docusaurus build DOCUSAURUS_PERF_LOGGER=true
==> [ERROR] [Error: ENOENT: no such file or directory, lstat 'D:\WORK\TECH\docusaurus-interne\DOCUSAURUS_PERF_LOGGER'] {

I'll try with other parameters to see :)

slorber commented 1 week ago

@clainchoupi each OS may have different ways to setup env variables (and it also depends on the shell you use on Windows afaik). But it's out of scope of Docusaurus to explain you how to set up env variables for your OS, you'll find plenty of tutorials online. The advise from @OzakIOne applies to Unix systems, not Windows (unless you use a compatible shell).

If you don't know how to do that, you can try to use cross-env, it should work in most contexts in an unified way.

I'm going to hide these discussions as off-topic because it's more about env variables than perf. Please open a dedicated discussion if needed, or ask support on Discord.