broccolijs / broccoli

Browser compilation library – an asset pipeline for applications that run in the browser
https://broccoli.build
MIT License
3.33k stars 217 forks source link

Memoization in Broccoli #396

Closed thoov closed 5 years ago

thoov commented 5 years ago

Background

Currently when a file change is detected broccoli will simply rebuild all of its nodes. Calling build can be expensive and results in nodes have to implement their own caching strategies (in best cases) or not caching (in worst cases). This PR "short circuits" the build chaining by enabling memoization (opt in via BROCCOLI_ENABLED_MEMOIZE=true) of nodes. This enables large amounts of the tree to no longer be rebuilt if broccoli detects that a nodes inputs have not changed (thus reducing rebuild times).

How does this work

  1. NodeWrappers (both source and transform) have a property called _revision that increments whenever their inputNodes have "changed"
  2. This property is changed one of 2 ways. For source nodes the watcher_adapter calls revise on them whenever sane detects a change. For transform nodes, revise is automatically called whenever its build method is called.
  3. During the build chain, we do not call the build method on a node if its inputNodes properties are the same as the last build attempt (a transform-node can "force" being built by passing volatile=true as an option).

Related PRs

thoov commented 5 years ago

@oligriffiths / @stefanpenner If you have some time could you both review this PR. I would like to discuss it at our Wednesday sync. The current status is still very much "work in progress". I just want to get your guy's input into the direction it is going and to talk about some of the "design" decisions I am making.

oligriffiths commented 5 years ago

following our chat https://gist.github.com/oligriffiths/ebff4a6947259e69aa970a6ac010acb2

thoov commented 5 years ago

@oligriffiths I sync'd with @stefanpenner after our talk and he brought up 2 good points that I don't believe we gave enough weight too:

  1. Doing the FSTree diffing after every build will be very expensive. Thus it wont be a great "step 1" approach as it will be a negative step backwards from a perf point of view.
  2. Even with the eventual this.output facade there will still need to be a way for plugins to signal that things are "revised". Example would be if the plugin does anything outside of the facade (like create a random file somewhere outside its outputPath) then it would still have to call this.revised(); based on its own logic.

I propose the following phases:

Phase 1:

@oligriffiths has expressed a vision towards a shouldBuild method that wrappers would call into which return if a build should happen or not. This method would be able to access a this.inputs facade that would be able to call this.inputs.hasChanges(). Example:

Base class:

class BroccoliPlugin {
  shouldBuild() {
    return this.inputs.hasChanges();
  }
}

Custom class:

class MyPlugin extends BroccoliPlugin {
  shouldBuild() {
    if (Math.random() > .5) {
      return false;
    }

    return true;
  }
}

Phase 2:

oligriffiths commented 5 years ago

@thoov Yeah that makes sense.

I think we could combine the two approaches and have the concept "revised" and output path be linked with a directory object. So rather than doing the file system diffing, expose revised() on the Directory object that wraps the outputPath and will eventually be the fs facade, meaning you won't need to manually call revised() on it.

So

build() {
  // do my plugin build
  this.output.revised();
}

or something.

If we can decide on what the interface of the Directory facade should look like that solves the issue now, if we think that having an object to mediate file system stuff AND perform revision tracking, then we could just implement the revision tracking aspect right now and leave the file system stuff for later.

Another thought: if a plugin opts into sideEffectFree, could we register a watcher for the output path and call revised() automatically? Or would that add too much overhead and slow things down? I know we moved to the tmp dir to get rid of inotify events to speed things up.

oligriffiths commented 5 years ago

Had some spare cycles so experimented with the above https://github.com/broccolijs/broccoli/pull/397

stefanpenner commented 5 years ago

Just had a chat with @oligriffiths, the summary of our chat was:

Given the above mentioned changes, we should 🚢. If you have concerns, or if we have missed something important, let's chat. But otherwise, we feel really good about this.

stefanpenner commented 5 years ago

:shipit:

oligriffiths commented 5 years ago

👍

lifeart commented 5 years ago

image image

But, results not appear

oligriffiths commented 5 years ago

@lifeart the print out shows the slowest nodes so skipped nodes will almost never show as they have zero time.

Which file did you make a change in to cause a rebuild and what else is in that tree?

thoov commented 5 years ago

@lifeart / @oligriffiths Can we move this conversation to a new issue? Could we also clearly state what the problem is and how to reproduce in that issue?

lifeart commented 5 years ago

@thoov sure, I will try to create reproduction repo with latest ember and pined broccoli master. @oligriffiths only component template.

lifeart commented 5 years ago

@thoov @oligriffiths https://github.com/broccolijs/broccoli/issues/404