FredKSchott / snowpack

ESM-powered frontend build tool. Instant, lightweight, unbundled development. ✌️
https://www.snowpack.dev
MIT License
19.48k stars 921 forks source link

Advanced build caching #1052

Closed FredKSchott closed 3 years ago

FredKSchott commented 3 years ago

Background

While Snowpack was originally released to explore the power of ESM-based build tooling, we quickly realized that one of the biggest benefits of this new paradigm was cache efficiency. If you can cache each file build individually, you have some guarantees that you can cache it indefinitely. If a file never changes and your Snowpack config never changes, you are technically guaranteed to be able cache a file build result forever.

Our dev server was able to take advantage of this right away, but with a fallback in case we were wrong: serve the cached value, THEN build the file behind the scenes to confirm that no config changed that would affect the final build. If the two don't match, then clear the entire cache and reload the page. All of this happens fast enough that the user generally won't even notice.

Feature Overview

If we can create a system that lets us actually guarantee that a file's cached build is still accurate, then we can dramatically speed up our builds by only re-building the files that have changed since last time.

The same goes for installs: if we can cache the exact install targets that were used to create each web_modules installation, then we can cache and reuse installations across builds, when dependency install targets don't change.

If both of these are implemented, warm builds for sites of all sizes (builds that would normally take seconds or even full minutes) could drop down to single-digit seconds.

Feature Request

Previously: Open Question: how to detect config changes? --- Only two things cause a file's cache to become stale: a change to the source file or a change to any one build plugin's configuration. - **Change to source file:** We already include this in our cache, so this can be considered done. - **Change to dependencies:** We already trigger a cache clean if we detect that your dependencies have changed. - ⚠️ **Change to any plugins configuration:** This one is tricky, and where most of the work for this will go. How can we detect changes to config like `babel.config.json` or `postcss.config.js`? A few ideas: - Automatically watch changes to any file with `.config.` in the name (or, any file in a `config` folder, configurable). - Plugins tell Snowpack which config files to listen to. For example, `@snowpack/plugin-babel` tells Snowpack about `babel.config.js/json`. - Plugins are responsible for loading config, and then returning a hash of the config used to build the file? Babel provides a way to load the exact config before transpiling a file, but I'm unsure if postcss provides a way to do this (it may just do the file reading itself). Either way, this is more work for plugin authors and wouldn't support CLI based build scripts. - Some other idea? - Some combination of the above? If we can guarantee that Snowpack will detect all relevant config changes, then I believe we can reuse built files much better then we currently are.
joshwilsonvu commented 3 years ago

I've only made a small contribution to Snowpack, but I'll share some thoughts anyway!

Babel discusses the caching issue here, and points out that .js config files are difficult to cache. I would recommend not caching any sort of .js config file, because a change in a dependency could change behavior without any modification to the config file itself. This problem could manifest with .json files as well through fields like "extends". As an extreme example, Babel might even be configured to use a plugin written within the source tree, which could change behavior dramatically without touching babel.config.json, and Babel itself would have no idea something has changed. Point is, "guarantee[ing] that Snowpack will detect all relevant config changes" is very hard.

However, I would expect that watching any files that look like a config file, and any files that plugins tell Snowpack about (options 1 & 2), should cover 90% of cases. So we need a way to make the common case work, and an escape hatch for the other 10%, i.e. reverting to the present behavior. This could piggyback on the build --clean option, or adding a --reload option, or a new option entirely. In addition to that, we would need to let users know that this optimization might cause some obscure issues and to see if this new option clears it up.

FredKSchott commented 3 years ago

All contributions are appreciated, regardless of size! :D

Great points! I bet we could get over 90% confidence here by also always clearing the cache when dependencies change (we already have a check for this in Snowpack) + giving you a way to mark other files to be watched. For example, if you load some config file outside of your working directory, you could add config that tells Snowpack to watch that file as well.

This also may point to the fact that this may never be default-on. If it's opt-in (ex: --cache or --incremental) then we can make sure that you understand what the requirements are.

alubbe commented 3 years ago

On top of just rebuilding after any changes to the dependencies, can't we fix the config fingerprinting/invalidating issue by not using fingerprints of the file(s), but instead building up a complete configuration object in memory from all files/settings/relevant environment variables/etc. and then fingerprint that?

joshwilsonvu commented 3 years ago

Even if we do build up the complete configuration objects in memory, it's no silver bullet. There may still be pathological cases that prevent it from working 100% of the time (ex. Babel can't detect changes in plugin implementations). I've got nothing against fingerprinting the complete loaded config, but the extra dev time and run time it takes to do that may or may not be worth dropping a 5% failure rate to 2%. It's impossible to know what those rates actually are without profiling all real-world usage.

We could leave that decision to the plugins: return an array of extra paths for Snowpack to watch, or return a string hash of the configuration object. Snowpack itself could support both without too much complexity.

The opt-in idea would ensure that Snowpack continues to Just Work, and hopefully most users will take advantage of it. You could always make it default-on later.

FredKSchott commented 3 years ago

Yea, I even looked at @babel/core and it doesn't give you a clear API to find out which config files you loaded from. You're right that we could load the options object ourselves, and do some serialization of the object, but it's slightly less explicit when that would or wouldn't work.

I think the design that's appearing from this conversation is something like:

This would be explicit, reasonably easy to debug (with good --verbose logs), and manually adding new config files to watch would be a straightforward manual intervention for custom/weird setups.

FredKSchott commented 3 years ago

Also, I'm starting to like the idea of going full Elon Musk and calling this --hyperspeed mode, or something equally silly 😄

alubbe commented 3 years ago

I like --hyperspeed - it also implies that the users have to pay attention or things might wrong ;)

One more thing about our codebase: we have 1000s of jsx files, but only 100s of dependencies. That makes it pretty slow to load in build mode, even over http2. Would it be possible to bundle together all of our jsx files into one (without any optimizations whatsoever), but keep the dependencies separate? That way the browser only has to assemble 100-200 files instead of 1000-1500 per page/route and with his new --hyperspeed mode, it would also build extremely fast on most changes (that don't modify/add/remove dependencies).

FredKSchott commented 3 years ago

100s of dependencies??? 🙀 Hope that this is a typo!

When you say "load in build mode", do you mean that the initial build itself takes a long time, or that after the initial build when you go to load the site in the browser that takes a long time? Is this a problem in the final build, or during development? And lastly, are you seeing this slow load time when you load files from your own machine or are you loading these files over the network?

If you're using your own server, Etag support would allow for better caching in the browser, even during development.

FredKSchott commented 3 years ago

Alright, updated the main issue with an updated plan. Would love any help on any of these broken-down steps!

alubbe commented 3 years ago

hey ya - just wanted to check in on this issue. is there some way to help out getting started with this, or is the current focus on new snowpack features elsewhere?

FredKSchott commented 3 years ago

Hey! I'd still love to see this, but it's advanced enough and nice-to-have enough that we need to focus elsewhere for now. I'll reopen this once v3 is out the door and we can revisit. Would still love help in other issues though, if you're interested in contributing!