facebook / docusaurus

Easy to maintain open source documentation websites.
https://docusaurus.io
MIT License
55.56k stars 8.33k forks source link

Performance - Reduce build time and memory usage - use alternative webpack JS loaders #4765

Open slorber opened 3 years ago

slorber commented 3 years ago

πŸ’₯ Proposal

With Webpack 5 support, re-build times are now faster.

But we still need to improve the time for the first build which is not so good currently.

Some tools to explore:

It will be hard to decouple Docusaurus totally from Webpack at this point.

But we should at least provide a way for users to use an alternative (non-Babel) JS loader that could be faster and good enough. Docusaurus core should be able to provide a few alternate loaders that would work by default using the theme classic, by just switching a config flag.

If successful and faster, we could make one of those alternate loader the default loader for new sites (when no custom babel config is found in the project).

Existing PR by @SamChou19815 for esbuild: https://github.com/facebook/docusaurus/pull/4532

slorber commented 2 years ago

Transforms may need to be ported to Rust, not 100% sure on that one though.

If we want i18n to work we'd rather port our plugin to Rust yes.

Although technically for now we only extract translations with the Babel plugin (no source change), so I think it may not even be necessary in the short term: we could keep extracting translations with Babel (slower but who cares) and only transpile in Rust.

If you are talking about SWCβ€”the current plugin system is still JS-based. swc.rs/docs/usage/plugins

Afaik NAPI-RS has low overhead to call RS from JS but not the opposite. That's also probably why Vercel (recently hired NAPI-RS creator) is porting popular Babel plugins to Rust

Josh-Cena commented 2 years ago

Although technically for now we only extract translations with the Babel plugin (no source change), so I think it may not even be necessary in the short term: we could keep extracting translations with Babel (slower but who cares) and only transpile in Rust.

Yeah, that's the idea. The extractor is only run in development so it's fine to be in Babel, but for build-time transformations (if we ever implement that) we'd rather use Rust.

fawazahmed0 commented 2 years ago

Hi, I am trying to use github actions to build my large site, currently github actions provide 6 hours and after that it timeouts. Is there any option to save the partial build of docusaurus site and resume the build again. So that i could complete the whole build in multiple github action runs.

Update: This should be possible by dockerizing your build process and then using checkpoint option, but due to huge ram usage( or swap), this may not be possible, because checkpointing will dump ram(or swap) and the github runner will have no hdd space left, though you can use this to increase space(& swap), but I don't think it will work for my usecase. Your mileage may vary.

slorber commented 2 years ago

We don't have a way to resume a build sorry.

We use Webpack which has caching layers (that we can persist) but afaik it can only persist at the end of a build and not incrementally. Unless someone shows how to persist Webpack cache before the end of a build, I assume it's not possible

Josh-Cena commented 2 years ago

I've thought about something similar in the past, using some sort of "lazy build" that doesn't load everything at once into memory. I don't know if Webpack is able to do that.

alexander-akait commented 2 years ago

Maybe useful https://webpack.js.org/configuration/experiments/#experimentslazycompilation

Josh-Cena commented 2 years ago

Ah yes! I did see that. Definitely worth looking into in the near future...

slorber commented 2 years ago

@alexander-akait already tried that but wasn't able to make it work successfully so far πŸ€ͺ

CleanShot 2022-05-25 at 19 55 26@2x

Also is this really supposed to improve a static site production build? Considering that in the end, everything must be compiled what's the point of using lazy compilation?

This Storybook benchmark with/without lazyCompilation also shows that the win for a production build is not significant: https://storybook.js.org/blog/storybook-performance-from-webpack-to-vite/

CleanShot 2022-05-25 at 19 54 00@2x

But it is useful for the dev env:

CleanShot 2022-05-25 at 19 55 13@2x

We definitively want the improvements of lazyCompilation, but this issue is more about the production build, so I'm not sure this option is very relevant here?

alexander-akait commented 2 years ago

@slorber I see, currenty my reccomendation is trying to switch from babel to swc (and swc minifier) too, it should help very well, also swc is more mature now

fawazahmed0 commented 2 years ago

If you get a terminate called after throwing an instance of 'std::bad_alloc' error during the build, you might also need to set vm.max_map_count to higher value. for example sysctl -w vm.max_map_count=655300

Reference

slorber commented 2 years ago

FYI the current canary and next release will include the suggestion from @adventure-yunfei

We now limit the concurrency when outputting static files at the end of the build process. This should overload less the system IOs and reduce memory footprint by default.

Note: this may not impact build time much, but mostly avoid a potential OOM at the end of your build.


There's a "secret" env variable in case you want to tune this concurrency setting: process.env.DOCUSAURUS_SSR_CONCURRENCY

https://github.com/facebook/docusaurus/pull/7547/files#diff-058c5cef3799e9df200345c9d5b769bbc27838aea32aa4f8345d09858d416781R96

For now, this is undocumented: please give us feedback on how impactful it is on your site, and eventually, we'll document it or add it as a first-class config setting.

vladfrangu commented 2 years ago

Just tried that secret env variable (on version 0.0.0-5101) and even setting it to 2 I get an OOM error on our website (I can link the repository if needed). Is there anything I can do to debug why the memory usage spikes up so high once the server is supposedly compiled?

slorber commented 2 years ago

Just tried that secret env variable (on version 0.0.0-5101) and even setting it to 2 I get an OOM error on our website (I can link the repository if needed). Is there anything I can do to debug why the memory usage spikes up so high once the server is supposedly compiled?

@vladfrangu consuming memory, above the default available in Node.js (0.5gb), does not seem unexpected to me for a large site. Now if we can't reasonably build a large site with 2-10GB of memory, that seems way more problematic.

The only thing you can do is profile your build and report your findings, to know which step exactly is taking more memory than you expect. I can't really teach you how to do this through a GitHub issue, I'm not an expert in this either.

See @fawazahmed0 comment here: https://github.com/facebook/docusaurus/issues/4765#issuecomment-910164698

Having a curve + an idea of what Webpack is working on is helpful.

PrivatePuffin commented 2 years ago

@alexander-akait Do you have exaples of using SWC and SWC minifier instead of babel?

slorber commented 2 years ago

@alexander-akait Do you have exaples of using SWC and SWC minifier instead of babel?

we are using SWC on the Docusaurus site now, so feel free to steal our config: https://github.com/facebook/docusaurus/pull/6944

wenerme commented 1 year ago

Possible to use mdx esbuild loader https://mdxjs.com/packages/esbuild/ to speedup the md build ?

rxliuli commented 1 year ago

Repo: https://github.com/liuli-moe/to-the-stars

Just migrated from vite-vuepress, but I think docusaurus still has a lot of room for performance improvement, a basic build performance comparison.

docusaurus

$ time pnpm docs-build

> to-the-stars@1.0.0 docs-build C:\Users\rxliuli\Code\book\to-the-stars
> docusaurus build

[INFO] [zh-Hans] Creating an optimized production build...

βœ” Client

βœ” Server
  Compiled successfully in 39.38s

βœ” Client

● Server β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ cache (99%) shutdown IdleFileCachePlugin
 stored

[SUCCESS] Generated static files in "build".
[INFO] Use `npm run serve` command to test your build locally.

real    0m51.324s
user    0m0.030s
sys     0m0.075s

vite-vuepress

$ time pnpm docs-build

> to-the-stars@1.0.0 docs-build C:\Users\rxliuli\Code\book\test-tts
> pnpm docs-setup && vuepress build && rimraf .vuepress/dist/books/

> to-the-stars@1.0.0 docs-setup C:\Users\rxliuli\Code\book\test-tts
> tsx .vuepress/init.ts

β Ό Initializing and preparing dataindex start 82
index end 82
β ΄ Initializing and preparing dataindex write
βœ” Initializing and preparing data - done in 2.55s
βœ” Compiling with vite - done in 10.69s
βœ” Rendering 85 pages - done in 2.19s
βœ” Generating sitemap to sitemap.xml - done in 7ms
success VuePress build completed in 16.11s!

real    0m18.996s
user    0m0.030s
sys     0m0.045s

General information about the documentation site

$ cloc books/
      83 text files.
      83 unique files.
      15 files ignored.

github.com/AlDanial/cloc v 1.94  T=0.52 s (158.4 files/s, 112091.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Markdown                        83          29090              0          29645
-------------------------------------------------------------------------------
SUM:                            83          29090              0          29645
-------------------------------------------------------------------------------

Not sure about the exact reason, but guesses could include the following

  1. Use babel instead of native compilers like esbuild/swc
  2. Use webpack instead of better bundles like rollup?
  3. Use mdx instead of markdown-it (I like the mdast series of toolkits, but its npm package is too fragmented to believe its performance is better)
Josh-Cena commented 1 year ago

@rxliuli MDX is what we base our product on and advertise for. It is not replaceable. The JS loader is already customizable and our website uses SWC ourselves, while esbuild also works (we used to use it and there are many successful integrations in the wild). Webpack could be replaced, but it's too much work, and we have a plugin lifecycle literally called configureWebpack that would break if the config it injects becomes no longer compatible.

wenerme commented 1 year ago

Wow, 51s is so fast, My ci https://github.com/wenerme/wener/actions runs about 10-15m using esbuild loader, the loader only affects js/ts, but all docs are md, md parsing seems slow.

rxliuli commented 1 year ago

Using esbuild-loader, although it is still about twice as slow as vite-vuepress, it has improved some

$ time pnpm docs-build

> to-the-stars@1.0.0 docs-build C:\Users\rxliuli\Code\book\to-the-stars
> docusaurus build

[INFO] [zh-Hans] Creating an optimized production build...

βœ” Client

βœ” Server
  Compiled successfully in 27.10s

βœ” Client

● Server β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ cache (99%) shutdown IdleFileCachePlugin
 stored

[SUCCESS] Generated static files in "build".
[INFO] Use `npm run serve` command to test your build locally.

real    0m37.529s
user    0m0.031s
sys     0m0.060s
arcanis commented 1 year ago

For what it's worth, I couldn't build my relatively average Docusaurus website on Netlify because of OOMs until I added --no-minify to yarn build.

alexander-akait commented 1 year ago

For CSS and HTML minification you can use https://www.npmjs.com/package/@swc/css (https://github.com/webpack-contrib/css-minimizer-webpack-plugin) and https://www.npmjs.com/package/@swc/html (docs https://github.com/webpack-contrib/html-minimizer-webpack-plugin#swchtml)

It is very fast, but young (but all bugs are fixing very fast), so feel free to check it.

In near future we will provide ability to replace css-loader and postcss-loader (except non standard postcss-plugin, we will support all popular plugins) on swc-loader

In plans - replace MDX (and provide markdown parser/transformers/codegen too), currently no date

Also for svg we are preparing swc_xml_parser and swc_svg_minifier (replacement for SVGO)

JakeChampion commented 1 year ago

There's a "secret" env variable in case you want to tune this concurrency setting: process.env.DOCUSAURUS_SSR_CONCURRENCY

#7547 (files)

For now, this is undocumented: please give us feedback on how impactful it is on your site, and eventually, we'll document it or add it as a first-class config setting.

Howdy ✌️ - I thought I'd give some feedback - I have a website with 9,785 pages and was getting out-of-memory issues when building. Using a combination of swc-loader and DOCUSAURUS_SSR_CONCURRENCY=5 has solved the problem and now it successfully builds ☺️

RDIL commented 1 year ago

Yeah, Babel is the root cause here, the problem is that the maintainers don't want to remove it because that would be a breaking change.

PrivatePuffin commented 1 year ago

For CSS and HTML minification you can use https://www.npmjs.com/package/@swc/css (https://github.com/webpack-contrib/css-minimizer-webpack-plugin) and https://www.npmjs.com/package/@swc/html (docs https://github.com/webpack-contrib/html-minimizer-webpack-plugin#swchtml)

It is very fast, but young (but all bugs are fixing very fast), so feel free to check it.

In near future we will provide ability to replace css-loader and postcss-loader (except non standard postcss-plugin, we will support all popular plugins) on swc-loader

In plans - replace MDX (and provide markdown parser/transformers/codegen too), currently no date

Also for svg we are preparing swc_xml_parser and swc_svg_minifier (replacement for SVGO)

Any experience with this and docusaurus?

PrivatePuffin commented 1 year ago

There's a "secret" env variable in case you want to tune this concurrency setting: process.env.DOCUSAURUS_SSR_CONCURRENCY #7547 (files) For now, this is undocumented: please give us feedback on how impactful it is on your site, and eventually, we'll document it or add it as a first-class config setting.

Howdy ✌️ - I thought I'd give some feedback - I have a website with 9,785 pages and was getting out-of-memory issues when building. Using a combination of swc-loader and DOCUSAURUS_SSR_CONCURRENCY=5 has solved the problem and now it successfully builds ☺️

swc loader also halved(!) the compilation time for our thousands of pages :)

johnnyreilly commented 1 year ago

I wrote up swapping out babel-loader for swc-loader. I might add in the DOCUSAURUS_SSR_CONCURRENCY tip too - might get us some more feedback!

jhaals commented 1 year ago

Just wanted to echo what's said here before about slow build speeds and taking up too much memory. We are using docusaurus for https://backstage.io but are struggling a lot with slow build speeds of almost an hour. We have switched to swc-loader and using DOCUSAURUS_SSR_CONCURRENCY 5 as suggested by @johnnyreilly above but we are still have issues with OOM issues when allocating 7GB ram for the build process. We would love to use the versioned docs feature but I don't see how that's possible at this point.

Our settings can be found here for anyone interested https://github.com/backstage/backstage/blob/master/microsite-next/docusaurus.config.js#L66 https://github.com/backstage/backstage/blob/master/.github/workflows/deploy_microsite.yml

PrivatePuffin commented 1 year ago

@jhaals Seem with our TrueCharts documentation system. We also have to cut content/pages because of the completely unusable build-times even with significant optimalisations such as jwc-loader and the SSR_CONCURRENCY setting

rxliuli commented 1 year ago

@jhaals @Ornias1993 Just a sharing, regarding the static document generator, there is an order of magnitude gap between vitepress and vuepress/docusaurus (in the actual test, the performance gap is 20 times), hoping that it can prompt docusaurus to do real Performance improvements. a test effect

vitepress 1m56.019s vuepress 14m18.764s docusaurus 36m39.857s

$ cloc docs/
     914 text files.
     914 unique files.
       0 files ignored.

github.com/AlDanial/cloc v 1.94  T=2.60 s (351.5 files/s, 319491.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Markdown                       914         371604              0         459249
-------------------------------------------------------------------------------
SUM:                           914         371604              0         459249
-------------------------------------------------------------------------------
PrivatePuffin commented 1 year ago

100% agreed, this needs a major release with performance refactor soon(tm). I also doubt this is really the performance facebook(inc) wants to showcase to the world.

It's in no-ones interest to have docusaurus perform this badly on bigger sites.

slorber commented 1 year ago

I also doubt this is really the performance facebook(inc) wants to showcase to the world. It's in no-ones interest to have docusaurus perform this badly on bigger sites.

I feel like you don't understand how open-source works here. Facebook tries first to solve its own internal use-case and is generous to expose their tool to the world, for free. They don't owe you anything. You can fix the performance problems yourself by forking the project, or give us the bandwidth to do so by contributing financially to OpenCollective. Many big companies are using Docusaurus today, but none of them outside Facebook is contributing significantly to its development apart Meta.

Please try to be part of the solution instead of complaining, and understand that Meta doesn't have an infinite budget to spend on Docusaurus alone just to please you.

slorber commented 1 year ago

Back to the performance issues.

Unfortunately, I understand the frustration, but I have limited bandwidth to work on it in the short term because I have to handle the regular maintenance of the project, ensure pending PRs do not become stale, and work on the MDX 2.0 + React 18 upgrade for Docusaurus v3 before starting any other perf/infra work.

The MDX 2.0 upgrade should bring some perf gains because the tool is faster (see their release blog post). Also, we'll remove the need to "retranspile" the mdx docs with Babel, this is an extra step we did historically but this definitively can be removed.

Regarding the memory consumption, out-of-memory errors, and things like that, I don't really think it's related to MDX but is probably something else in our system (the SSG part?) that we'll have to profile and optimize. If you are a good perf profiler, you can help investigate here and help us figure out what are the bottlenecks of Docusaurus currently.

I am 100% aware of the new wave of Rust tooling being released, I track them all and their updates in my newsletter This Week In React. We'd like to explore the usage of various tools and adopt Rust at multiple levels.

Good candidates to modernize our infrastructure:

As you notice, many of these projects are quite new, not battle-tested yet, or will disrupt our plugin ecosystem in their current form. I think we can try to adopt them progressively, but it's also important to let them mature a bit. We will try to adopt these tools incrementally over time.

There's also the possibility to migrate Docusaurus to another infrastructure (like Next.js, Astro, Remix) and/or the attempt to make the opinionated features we provide framework-agnostic.

I hope you understand better the direction I want to take here. We'll figure out how to make Docusaurus fast in the long term, but it's unlikely it will become extremely fast in the very short term unfortunately.

RDIL commented 1 year ago

@slorber Babel and Terser are the biggest bottlenecks by far. I've gotten it to build a 100000 page site in less than 10 minutes with 3gb of ram with terser and babel both switched out for esbuild

PrivatePuffin commented 1 year ago

I feel like you don't understand how open-source works here. Facebook tries first to solve its own internal use-case and is generous to expose their tool to the world, for free.

Please don't assume what I do ordo not know.

They don't owe you anything. You can fix the performance problems yourself by forking the project, or give us the bandwidth to do so by contributing financially to OpenCollective. Many big companies are using Docusaurus today, but none of them outside Facebook is contributing significantly to its development apart Meta.

In stark contrast to Meta, I do not have near-unlimited money to throw at a virtual-wall and I'm already maintaining one opensource projects nearly full-time. Nor is the backend of docusaurus within my general development area of expertise, so the amount of hours required for me would not be beneficial

Please try to be part of the solution instead of complaining, and understand that Meta doesn't have an infinite budget to spend on Docusaurus alone just to please you.

I never said I wanted to be pleased, I gave my opinion in how this looks like, to me as an outsider. I see a failing company, have a docs product that underperforms, that is presented like a good performer on the website.

What I wanted to make clear, which I literally said, is that this is in the interest of no one. Not us as users, not you as maintainers nor meta as a company.

Not every comment on an issue is inherently aimed at solving the issue. Is my opinion above constructive, no definitely not. But it does showcase what this years old stagnated issue looks like from an outside perspective.

PrivatePuffin commented 1 year ago

@slorber Babel and Terser are the biggest bottlenecks by far. I've gotten it to build a 100000 page site in less than 10 minutes with 3gb of ram with terser and babel both switched out for esbuild

Was that with docusaurus, or a similair DIY setup? Some example code you care to share? :)

RDIL commented 1 year ago

With Docusaurus - https://github.com/facebook/docusaurus/discussions/3132#discussioncomment-3615781 for esbuild, can't remember exactly what it's called but there is an environment variable to disable Terser (NO_MINIFY=true I think?)

PrivatePuffin commented 1 year ago

Regarding the memory consumption, out-of-memory errors, and things like that, I don't really think it's related to MDX but is probably something else in our system (the SSG part?) that we'll have to profile and optimize. If you are a good perf profiler, you can help investigate here and help us figure out what are the bottlenecks of Docusaurus currently.

Imho this is the biggest issue, besides generation times. What I noticed personally, is that it's primarily bigger pages that cause the most ram consumption.

The different between 3000 small and 3000 big md pages, is insanely big. We're talking about "10gb can do it" or "will eat up 40gb with ease and OOM halfway through". But still, within 10gb I can WITH EASE have 10 copies of the data in ram already.

It almost looks like a memoryleak to be honest.

It's also interesting to see errors about an internal babel dependency throwing "bigger than 500kb" warnings on bigger md files that are definately still under 200kb. So somewhere in the stack it looks like pages are blown up and not cleaned-up when processed correctly.

RDIL commented 1 year ago

It's also interesting to see errors about an internal babel dependency throwing "bigger than 500kb" warnings on bigger md files that are definately still under 200kb. So somewhere in the stack it looks like pages are blown up and not cleaned-up when processed correctly.

That's MDX.

PrivatePuffin commented 1 year ago

It's also interesting to see errors about an internal babel dependency throwing "bigger than 500kb" warnings on bigger md files that are definately still under 200kb. So somewhere in the stack it looks like pages are blown up and not cleaned-up when processed correctly.

That's MDX.

Ahh yes, the above transpile refered. Any sign if that's also the memoryleak (I think we can safely call it that at this point) is comming from?

RDIL commented 1 year ago

It's functionally leaking because the bundling process is going through thousands of files and trying to understand what code uses what modules, and there are hundreds of modules to manage, etc etc etc

Each of those JS files need to be parsed into ASTs, optimized, tree-shaken, it's just an operation that is inefficient to do in JS itself, which is why Babel is not that great.

armano2 commented 1 year ago

I have been investigating this a little, generally webpack build is defenitly big part, that includes all plugins, but i was surprised to see that it actually took significant amount of time to generate webpack config.

on my machine this roughly takes 1min from start of buildLocale method to execution of compile on webpack.

my spec CPU: 12th Gen Intel(R) Core(TM) i7-12700H Memory: 32.0 GB, 4800 MHz DDR5 SSD: Samsung SSD 980 PRO 1TB

as a test case i have been using this project website

config generated by webpack seem substantial, as it contains 6k lines (https://gist.github.com/armano2/a7cf275c8763ab33c112ec0dc0269295)


this may seem insignificant, but such big webpack config + time that it takes to generate it most likely is getting even slower/bigger for larger projects

PrivatePuffin commented 1 year ago

It's functionally leaking because the bundling process is going through thousands of files and trying to understand what code uses what modules, and there are hundreds of modules to manage, etc etc etc

Each of those JS files need to be parsed into ASTs, optimized, tree-shaken, it's just an operation that is inefficient to do in JS itself, which is why Babel is not that great.

Understandable, but even so when GC cannot shrink ram consumption within the limits, it should differ ton”!(preferably high iops) disk.

tests with swap indicates that works relatively well from a performance perspective

armano2 commented 1 year ago

Understandable, but even so when GC cannot shrink ram consumption within the limits, it should differ ton”!(preferably high iops) disk.

tests with swap indicates that works relatively well from a performance perspective

actually there is more issues, build process requires to find all files and read them twice,

  1. loop trough files and collect metadata, this is used to register all routes and feed to likify
  2. webpack has to loop trough all files and build them with metadata provided

as metadata i'm talking about --- and file names


in theory we could move this to webpack plugin and collect them before we process files,

webpack has a lot of hooks that we can tap into in plugins

with that we should reduce amount of iops by substantial number

PrivatePuffin commented 1 year ago

Understandable, but even so when GC cannot shrink ram consumption within the limits, it should differ ton”!(preferably high iops) disk. tests with swap indicates that works relatively well from a performance perspective

actually there is more issues, build process requires to find all files and read them twice,

  1. loop trough files and collect metadata, this is used to register all routes and feed to likify
  2. webpack has to loop trough all files and build them with metadata provided

as metadata i'm talking about --- and file names

in theory we could move this to webpack plugin and collect them before we process files,

webpack has a lot of hooks that we can tap into in plugins

with that we should reduce amount of iops by substantial number

Thats not what I was saying, I was saying that it should overflow to disk if there isnt enough ram.

we can already fake this with swap which works fine.

sibelius commented 1 year ago

what about cache?

can we cache the latest build output and only rebuild what changed?

slorber commented 1 year ago

@sibelius you can cache node_modules/.cache across builds, which should speed up the Docusaurus rebuilds thanks to Webpack 5 caching

Caching build and .docusaurus won't lead to any significant benefit, but I'll look into later to see if there are low-hanging fruit optimizations we could apply.

mattrunyon commented 1 year ago

Switching to swc-loader greatly helped our build times. We tried caching in Github Actions, but it was actually problematic for us because the cache size was 3.5GB and quickly got evicted. We probably could have set up our actions better to only create caches from the main branch builds, and only read caches for deploy preview builds on PRs.

It would be nice to be able to specify compression for the cache as well as a no-cache option. Webpack supports gzip and brotli for cache compression. While using brotli compression added some time to my local clean build (8:22 to 8:46 user time), it also cut the cache from 3.5GB to 1.5GB.

For people not using the cache in CI, disabling cache would be nice. This reduced my local build from 8:22 to 7:07 user time.

Minifying with esbuild is also a huge boost. 13:13 to 8:37. Only 15 seconds more than no minification at all for my build

Seems like there's some easy opt-in optimizations which should result in equivalent builds, but faster

slorber commented 1 year ago

Thanks for reporting these low hanging fruits @mattrunyon

FYI you should be able to disable cache or customize cache.compresssion by creating a plugin with configureWebpack lifecycle hook

mattrunyon commented 1 year ago

For anybody who wants a plugin to switch minifiers or disable/compress cache, here's an example. This requires you to install esbuild as a dev dependency at least (npm install -D esbuild)

const TerserPlugin = require('terser-webpack-plugin');

module.exports = {
  name: 'webpack-docusaurus-plugin',
  configureWebpack(config, isServer, utils) {
    // Disable cache in CI since it gets evicted too quickly from github actions limits
    const isCI = process.env.CI;
    const cacheOptions = isCI ? { cache: false } : {};

    // Or compress the cache w/ gzip or brotli
    // const cacheOptions = isCI ? { cache: { compression: 'brotli' } } : {};

    // Replace terser with esbuild minify, but only if terser would have been used
    // This still respects the --no-minify flag
    const minimizer = new TerserPlugin({
      minify: TerserPlugin.esbuildMinify,
    });
    const minimizers = config.optimization.minimizer?.map((m) =>
      m instanceof TerserPlugin ? minimizer : m
    );

    return {
      mergeStrategy: { 'optimization.minimizer': 'replace' },
      optimization: {
        minimizer: minimizers,
      },
      ...cacheOptions
    };
  },
};