webpack 核心概念和优化指南

creeperyang commented 1 year ago

从功能核心来说，webpack 是JS应用的打包工具（static module bundler）。webpack 会从入口（entry point）开始处理你的应用，构建依赖图，把多个模块（module）合并到一个或多个包（bundle）。

webpack 本身只能处理 JS/JSON 文件，它依赖各种 loader、plugin 来共同完成对复杂应用的支持。

（一）核心概念

loader vs plugin

Loaders are transformations that are applied to the source code of a module. Plugins are the backbone of webpack. Webpack itself is built on the same plugin system that you use in your webpack configuration!

loader 就是转换模块代码的工具函数，允许你预处理你要加载的文件。

plugin 是 webpack 的基石，而 webpack 也是基于同样的插件系统打造。plugin 本质是注册webpack的生命周期事件来做到 loader 做不到的事。

loader 顺序

module.exports = {
  module: {
    rules: [
      {
        use: ['a-loader', 'b-loader', 'c-loader'],
      },
    ],
  },
};

loader 是洋葱型顺序，pitch 从左到右，loader 本身从右到左执行。

|- a-loader `pitch`
  |- b-loader `pitch`
    |- c-loader `pitch`
      |- requested module is picked up as a dependency
    |- c-loader normal execution
  |- b-loader normal execution
|- a-loader normal execution

module vs chunk

Every file used in your project is a Module

模块化编程中，完整的程序可以被分成完成特定功能的模块。在 webpack 里，一个文件就是一个模块。

编译过程中，modules are combined into chunks， Chunks combine into chunk groups。

chunk 有两种形式：

initial chunk：是 main chunk for entry point，包括一个 entry point 的所有模块和模块的依赖；
non-initial chunk：是指可能懒加载的 chunk，由懒加载或者使用SplitChunksPlugin时产生。

（二）优化构建效率

总的来说，想要提高打包/构建效率，要么是减少打包工作量，要么是提高打包速度。相关措施可以总结为：

并行打包：多线程/多进程打包。
更高性能的打包工具：利用esbuild、swc。
怎么减少工作量？缓存，以及缩减查找步骤、范围。

持久化缓存，提高二次构建性能

如果不想变更一个模块导致打包过程（可能几百个模块）重复一遍，那么我们需要利用缓存持久化来避免不必要的工作。

Cache the generated webpack modules and chunks to improve build speed.

配置缓存很简单，配置 cache 即可：

{
    cache: {
        type: 'filesystem', // 缓存到 memory 或 filesystem
        // 额外的依赖文件，当这些文件内容变化时，缓存会完全失效而执行完整的编译构建，通常可设置为项目配置文件
        buildDependencies: {
            config: [path.join(__dirname, 'webpack.dll_config.js')],
        },
        // 缓存文件存放的路径，默认为 node_modules/.cache/webpack
        cacheDirectory: 'node_modules/.cache/webpack',
        maxAge: 5184000000,
    },
}

cache.type：默认为memory，这在watch模式下很有用，但是如果想持久化，方便中断后下次打包使用，可以设置为filesystem；
cache.cacheDirectory：缓存文件存放的路径，默认为 node_modules/.cache/webpack；
cache.maxAge：缓存失效时间，默认为 5184000000；
cache.buildDependencies：额外的依赖文件，当这些文件内容变化时，缓存会完全失效而执行完整的编译构建，通常可设置为项目配置文件。

配置完缓存后，测试的两次编译时间为 2047ms 和 417 ms，效果显著。

关于 dll

dll 动态链接库本质也是缓存，即不经常改变的代码抽取成一个共享的库，然后直接使用。

通常：

打包dll库：通过单独的 webpack 配置（DllPlugin），打包得到 [name].dll.js 和 [name].manifest.json；
引用dll库：另外的 webpack 配置（DllReferencePlugin、AddAssetHtmlWebpackPlugin）引入上面的文件。

由于从 webpack@4 开始，webpack 打包性能已经足够好，dll 模式被弃用。

减少编译查找路径、编译范围（减少查找时间，减少需要编译的文件）

除了缓存，还可以缩减编译查找步骤、范围来减小工作量。

1. Rule 的 exclude/include/issuer 等多种方式减少查找范围

{
    module: {
        rules: [
            {
                test: /\.jsx?/,
                exclude: [/node_modules/],
            },
            {
                test: /\.css$/,
                include: [
                    path.resolve(__dirname, 'app/styles'),
                    path.join(__dirname, 'vendor/styles/'),
                ],
            },
        ],
    },
}

2. noParse 跳过编译

使用 noParse 让 webpack 不要去解析特地文件，对忽略一些大型类库，可以节省很多时间。

module.exports = {
  module: {
    noParse: /jquery|lodash/,
  },
};

3. 配置 resolve 减少查找范围

尽量减少webpack的查找范围。

module.exports = {
  resolve: {
    // 视情况可减少
    importsFields: ['browser', 'module', 'main'],
    // 视情况可减少
    extensions: ['.js', '.json', '.wasm'],
    modules: [path.resolve(__dirname, 'src'), 'node_modules'],
  },
};

提升编译性能（通过跳过不必要的编译步骤等）

1. 开发阶段禁止产物优化

minimize 压缩
concatenateModules 模块连接
tree-shaking 功能（usedExports: false）
splitChunks 分包

除了压缩强烈建议开发阶段关闭，其它几项看个人需要。

2. 合适的 sourcemap 配置

比如开发阶段可以设置为：eval/eval-source-map/(none)；
生产环境：source-map。

{
    devtool: __DEV__ ? 'eval' : 'source-map'
}

3. 减少 watch 文件范围

module.exports = {
  watchOptions: {
    aggregateTimeout: 600,
    ignored: '**/node_modules',
  },
};

4. `experiments.lazyCompilation` 需要时再编译

{
// define a custom backend
backend?: ((
  compiler: Compiler,
  callback: (err?: Error, api?: BackendApi) => void
  ) => void)
  | ((compiler: Compiler) => Promise<BackendApi>)
  | {
    /**
     * A custom client.
    */
    client?: string;

    /**
     * Specify where to listen to from the server.
     */
    listen?: number | ListenOptions | ((server: typeof Server) => void);

    /**
     * Specify the protocol the client should use to connect to the server.
     */
    protocol?: "http" | "https";

    /**
     * Specify how to create the server handling the EventSource requests.
     */
    server?: ServerOptionsImport | ServerOptionsHttps | (() => typeof Server);

},
entries?: boolean,
imports?: boolean,
test?: string | RegExp | ((module: Module) => boolean)
}

多线程/多进程处理

thread-loader：多进程运行loader，官方方案（HappyPack已停止更新）。
parallel-webpack: 多进程运行Webpack构建实例，适合多entry points。
支持多进程的 loader 和 plugin：比如：TerserWebpackPlugin 的多进程模式。

// TerserWebpackPlugin 的多进程模式
module.exports = {
  optimization: {
    minimize: true,
    minimizer: [
      new TerserPlugin({
        parallel: true,
      }),
    ],
  },
};

利用更高性能的 swc/esbuild 来压缩

swc：基于Rust开发的 JS compiler；
esbuild：基于Go开发的 JS bundler/minifier（vite也使用）。

两者通过 Rust/Go 提高了性能。

module.exports = {
  optimization: {
    minimize: true,
    minimizer: [
      new TerserPlugin({
        minify: TerserPlugin.swcMinify,
        // `terserOptions` options will be passed to `swc` (`@swc/core`)
        // Link to options - https://swc.rs/docs/config-js-minify
        terserOptions: {},

        minify: TerserPlugin.esbuildMinify,
        // `terserOptions` options will be passed to `esbuild`
        // Link to options - https://esbuild.github.io/api/#minify
        // Note: the `minify` options is true by default (and override other `minify*` options), so if you want to disable the `minifyIdentifiers` option (or other `minify*` options) please use:
        // terserOptions: {
        //   minify: false,
        //   minifyWhitespace: true,
        //   minifyIdentifiers: false,
        //   minifySyntax: true,
        // },
      }),
    ],
  },
};

（三）优化

webpack caching （长期缓存）

理解 hash/chunkhash/contenthash

首先了解 webpack 中 hash 相关概念： https://webpack.js.org/configuration/output/#template-strings

hash/fullhash，Compilation-level，本次编译（compilation）的 hash。可以理解为项目级别的，任意改动基本都会导致 hash 变更。
chunkhash，Chunk-level，chunk 的 hash。不同 chunk 之间互不影响。
contenthash，Module-level，模块相关内容的 hash。

cache 第一步：Output FileNames

推荐使用 contenthash 来防止有改动但缓存未失效的问题（用户页面没访问最新内容）。

module.exports = {
  entry: "./src/index.js",
  output: {
    filename: "[name].[contenthash].js",
    path: path.resolve(__dirname, "dist"),
    clean: true
  }
};

cache 第二步：拆出模板代码（Extracting Boilerplate）

runtime 代码拆到单独文件

module.exports = {
  optimization: {
    runtimeChunk: "single" // 等价于
    /**
     * runtimeChunk: {
     *     name: 'runtime',
     * },
     */
  }
};

optimization.runtimeChunk设置为"single" 会为所有生成的 chunk 创建一个共用的 runtime。如果设置为false则会在每个 chunk 里面嵌入 runtime。

公共库单独拆出

module.exports = {
    entry: './src/index.js',
    output: {
      filename: '[name].[contenthash].js',
      path: path.resolve(__dirname, 'dist'),
      clean: true,
    },
  optimization: {
    runtimeChunk: "single",
    splitChunks: {
        // 所有 node_modules 内的公共包单独打包为 "vendors.contenthash.js"
       cacheGroups: {
         vendor: {
           test: /[\\/]node_modules[\\/]/,
           name: 'vendors',
           chunks: 'all',
         },
       },
    },
  }
};

以上两步让公共包和runtime可以不因为业务代码变更而缓存失效。

cache 第三步：Module Identifiers

假设这样一个情况：新增一个文件'./src/print.js' 并被 './src/index.js' （main）作为依赖引入使用。重新编译：

The main bundle changed because of its new content.
The vendor bundle changed because its module.id was changed.
And, the runtime bundle changed because it now contains a reference to a new module.

我们发现 main/vendors/runtime 文件名（hash）都变了。理论上 vendors 应该不变。

引入 optimization.moduleIds="deterministic" 可以解决这类问题：

natural：基于使用顺序的数字 id。模块增减会导致id变更。
deterministic：模块名hash得出的数字id（默认3位）。跟顺序无关，解决了模块增减导致其它chunk的module的id（natural）也变化的问题。

另外 optimization.chunkIds="deterministic" 也是 production 模式默认的。

Concatenate Module（Scope Hoisting）

optimization.concatenateModules允许webpack去查找可以安全串联的模块来串联/合并到一个模块。

webpack之前会把每个module都放入单独的wrapper function，但这会拖慢执行速度。concatenateModules 可以像 RollupJS 之类尽可能把module安全合并，合并到同一个闭包下面。

对 Concatenate Module 而言：

ES Module 才能开启，Commonjs 无法使用。
模块被多个 Chunk 引用时，由于避免重复打包，也会失效。

creeperyang commented 1 year ago

一些细节问题

1. `output.library` 及相关的 `commonjs | commonjs2` 等差别

一个问题，commonjs2 和 commonjs 有什么差别？

一句话解释：

commonjs 是最初版本CommonJS规范，只定义了 exports，所以编译产出一般是 exports['name'] = xxx；
commonjs2 是目前广泛使用的模块定义系统，编译产出是 module.exports = xxx（cjs2 支持 default export 和 named export）。

基于webpack来给出更多信息：

有如下 webpack 配置来生成 umd 格式的代码：

module.exports = {
  //...
  output: {
    library: {
      name: 'MyLibrary',
      type: 'umd',
    },
  },
};

编译产出：

(function webpackUniversalModuleDefinition(root, factory) {
  if (typeof exports === 'object' && typeof module === 'object')
    module.exports = factory();
  else if (typeof define === 'function' && define.amd) define([], factory);
  else if (typeof exports === 'object') exports['MyLibrary'] = factory();
  else root['MyLibrary'] = factory();
})(global, function () {
  return _entry_return_;
});

18boys commented 1 year ago

已收到,稍候会处理best wish~

creeperyang / blog