Explore creating a 'reverse engineered' records.json / stats.json file from a webpack build

0xdevalias commented 4 months ago

This is an idea I've had in passing a few times, but keep forgetting to document it:

https://medium.com/@songawee/long-term-caching-using-webpack-records-9ed9737d96f2
- there are many factors that go into getting consistent filenames. Using Webpack records helps generate longer lasting filenames (cacheable for a longer period of time) by reusing metadata, including module/chunk information, between successive builds. This means that as each build runs, modules won’t be re-ordered and moved to another chunk as often which leads to less cache busting.
- The first step is achieved by a Webpack configuration setting: recordsPath: path.resolve(__dirname, ‘./records.json’) This configuration setting instructs Webpack to write out a file containing build metadata to a specified location after a build is completed.
- It keeps track of a variety of metadata including module and chunk ids which are useful to ensure modules do not move between chunks on successive builds when the content has not changed.
- With the configuration in place, we can now enjoy consistent file hashes across builds!
- In the following example, we are adding a dependency (superagent) to the vendor-two chunk.
  
  We can see that all of the chunks change. This is due to the module ids changing. This is not ideal as it forces users to re-download content that has not changed.
  
  The following example adds the same dependency, but uses Webpack records to keep module ids consistent across the builds. We can see that only the vendor-two chunk and the runtime changes. The runtime is expected to change because it has a map of all the chunk ids. Changing only these two files is ideal.
https://webpack.js.org/configuration/other-options/#recordspath
- recordsPath: Use this option to generate a JSON file containing webpack "records" – pieces of data used to store module identifiers across multiple builds. You can use this file to track how modules change between builds.
https://github.com/search?q=path%3A%22webpack.records.json%22&type=code
- https://github.com/GooTechnologies/goojs/blob/master/webpack.records.json

I'm not 100% sure if this would be useful, or partially useful, but I think I am thinking of it tangentially in relation to things like:

0xdevalias commented 4 months ago

Even more tangentially related to this, I've pondered how much we could 're-construct' the files necessary to use tools like bundle analyzer, without having access to the original source (or if there would even be any benefit to trying to do so):

https://github.com/webpack-contrib/webpack-bundle-analyzer
- Webpack plugin and CLI utility that represents bundle content as convenient interactive zoomable treemap
- https://github.com/webpack-contrib/webpack-bundle-analyzer#usage-as-a-cli-utility
- You can analyze an existing bundle if you have a webpack stats JSON file.
  
  You can generate it using BundleAnalyzerPlugin with generateStatsFile option set to true or with this simple command: webpack --profile --json > stats.json
https://webpack.js.org/api/stats/
- Stats Data When compiling source code with webpack, users can generate a JSON file containing statistics about modules. These statistics can be used to analyze an application's dependency graph as well as to optimize compilation speed.
https://nextjs.org/docs/pages/building-your-application/optimizing/bundle-analyzer
- https://www.npmjs.com/package/@next/bundle-analyzer

My gut feel is that we probably can figure out most of what we need for it; we probably just can't give accurate sizes for the original pre-minified code, etc; and the module names/etc might not be mappable to their originals unless we have module identification type features (see https://github.com/pionxzh/wakaru/issues/41)

pionxzh commented 4 months ago

You want a re-constructed stat.json or records.json which can be put back into an analyzer plugin, right? This can be useful to understand the shape and code size distribution in chunks.

I just did some research on it. I feel it's possible to generate stats.json, but it requires deep understanding about the bundling details of webpack. And the module graph would be a must for us to do this.

This is the sample that I get on google. https://gist.github.com/TheLarkInn/577d6a8896b4553d4b2865fe1c8db7fa

0xdevalias commented 4 months ago

You want a re-constructed stat.json or records.json which can be put back into an analyzer plugin, right?

@pionxzh nods yeah, that was what I was originally thinking about; and then I was thinking that there might also be some crossover with the parts used for this that could align with figuring how to identify module changes/etc.

Here's another search that should pull up a bunch more samples:

https://github.com/search?q=path%3A%22stats.json%22+assetsByChunkName&type=code

0xdevalias / chatgpt-source-watch

Explore creating a 'reverse engineered' records.json / stats.json file from a webpack build #9