broccolijs / broccoli-persistent-filter

MIT License
12 stars 33 forks source link

broccoli-persistent-filter

Build Status Build status

Helper base class for Broccoli plugins that map input files into output files. Except with a persistent cache to fast restarts. one-to-one.

API

class Filter {
  /**
   * Abstract base-class for filtering purposes.
   *
   * Enforces that it is invoked on an instance of a class which prototypically
   * inherits from Filter, and which is not itself Filter.
   */
  constructor(inputNode: BroccoliNode, options: FilterOptions): Filter;

  /**
   * method `processString`: must be implemented on subclasses of
   * Filter.
   *
   * The resolved return value can either be an object or a string.
   *
   * An object can be used to cache additional meta-data that is not part of the
   * final output. When an object is returned, the `.output` property of that
   * object is used as the resulting file contents.
   *
   * When a string is returned it is used as the file contents.
   */
  processString(contents: string, relativePath: string): {string | object };

  /**
   * Method `getDestFilePath`: determine whether the source file should
   * be processed, and optionally rename the output file when processing occurs.
   *
   * Return `null` to pass the file through without processing. Return
   * `relativePath` to process the file with `processString`. Return a
   * different path to process the file with `processString` and rename it.
   *
   * By default, if the options passed into the `Filter` constructor contain a
   * property `extensions`, and `targetExtension` is supplied, the first matching
   * extension in the list is replaced with the `targetExtension` option's value.
   */
  getDestFilePath(relativePath: string): string;

  /**
   * Method `postProcess`: may be implemented on subclasses of
   * Filter.
   *
   * This method can be used in subclasses to do processing on the results of
   * each files `processString` method.
   *
   * A common scenario for this is linting plugins, where on initial build users
   * expect to get console warnings for lint errors, but we do not want to re-lint
   * each file on every boot (since most of them will be able to be served from the
   * cache).
   *
   * The `.output` property of the return value is used as the emitted file contents.
   */
  postProcess(results: object, relativePath: string): object

}

Options

All options except name and annotation can also be set on the prototype instead of being passed into the constructor.

Example Usage

const Filter = require('broccoli-persistent-filter');

class Awk extends Filter {
  constructor(inputNode, search, replace, options = {}) {
    super(inputNode, {
      annotation: options.annotation
    });
    this.search = search;
    this.replace = replace;
    this.extensions = ['txt'];
    this.targetExtension = 'txt';
  }
  processString(content, relativePath) {
    return content.replace(this.search, this.replace);
  }
}

In Brocfile.js, use your new Awk plugin like so:

var node = new Awk('docs', 'ES6', 'ECMAScript 2015');

module.exports = node;

Persistent Cache

Adding persist flag allows a subclass to persist state across restarts. This exists to mitigate the upfront cost of some more expensive transforms on warm boot. It does not aim to improve incremental build performance, if it does, it should indicate something is wrong with the filter or input filter in question.

By default, if the the CI=true environment variable is set, peristent caches are disabled. To force persistent caches on CI, please set the FORCE_PERSISTENCE_IN_CI=true environment variable;

How does it work?

It does so but establishing a 2 layer file cache. The first layer, is the entire bucket. The second, cacheKeyProcessString is a per file cache key.

Together, these two layers should provide the right balance of speed and sensibility.

The bucket level cacheKey must be stable but also never become stale. If the key is not stable, state between restarts will be lost and performance will suffer. On the flip-side, if the cacheKey becomes stale changes may not be correctly reflected.

It is configured by subclassing and refining cacheKey method. A good key here, is likely the name of the plugin, its version and the actual versions of its dependencies.

const Filter = require('broccoli-persistent-filter');

class Subclass extends Filter {
  cacheKey() {
    return md5(Filter.prototype.call(this) + inputOptionsChecksum + dependencyVersionChecksum);
  }
}

The second key, represents the contents of the file. Typically the base-class's functionality is sufficient, as it merely generates a checksum of the file contents. If for some reason this is not sufficient (e.g. if the file name should be considered), it can be re-configured via sub-classing.

Note that this method is not useful for general purpose cache invalidation since it's only used to restore the cache across processes and doesn't apply for rebuilds. See the dependencyInvalidation option above to invalidate files that have dependencies that affect the output.

const Filter = require('broccoli-persistent-filter');

class Subclass extends Filter {
  cacheKeyProcessString(string, relativePath) {
    return superAwesomeDigest(string);
  }
}

It is recommended that persistent re-builds is opt-in by the consuming plugin author, as if no reasonable cache key can be created it should not be used.

var myTree = new SomePlugin('lib', { persist: true });

Warning

By using the persistent cache, a lot of small files will be created on the disk without being deleted. This might use all the inodes of your disk. You need to make sure to clean regularly the old files or configure your system to do so.

On OSX, files that aren't accessed in three days are deleted from /tmp. On systems using systemd, systemd-tmpfiles should already be present and regularly clean up the /tmp directory. On Debian-like systems, you can use tmpreaper. On RedHat-like systems, you can use tmpwatch.

By default, the files are stored in the operating system's default directory for temporary files, but you can change this location by setting the BROCCOLI_PERSISTENT_FILTER_CACHE_ROOT environment variable to the path of another folder.

To clear the persistent cache on any particular build, set the CLEAR_BROCCOLI_PERSISTENT_FILTER_CACHE environment variable to true like so:

CLEAR_BROCCOLI_PERSISTENT_FILTER_CACHE=true ember serve

Dependency Invalidation

When the output of processString() can depend on files other than the primary input file, the broccoli plugin should use the dependencyInvalidation option and these related APIs to cause the output cache to become automatically invalidated should those other input files change.

Plugins that enable the dependencyInvalidation option will have an instance property dependencies that can be used to register dependencies for a file.

During either processString or postProcess, the plugin should call this.dependencies.setDependencies(relativeFile, arrayOfDeps) to establish which files this file depends on.

Dependency invalidation works during rebuilds as well as when restoring results from the persistent cache.

When tracking dependencies, setDependencies() should always be called when processing a file that could have dependencies. If a file has no dependencies, pass an empty array. Failure to do this can result in stale dependency information about the file.

The dependencies passed to setDependencies() can be absolute paths or relative. If relative, the path will be assumed relative to the file being processed. The dependencies can be within the broccoli tree or outside it (note: adding dependencies outside the tree does not cause those files to be watched). Files inside the broccoli tree are tracked for changes using a checksum because files in broccoli trees do not have stable timestamps. Files outside the tree are tracked using modification time.

FAQ

Upgrading from 0.1.x to 1.x

You must now call the base class constructor. For example:

// broccoli-filter 0.1.x:
function MyPlugin(inputTree) {
  this.inputTree = inputTree;
}

// broccoli-filter 1.x:
function MyPlugin(inputNode) {
  Filter.call(this, inputNode);
}

Note that "node" is simply new terminology for "tree".

Source Maps

Can this help with compilers that are almost 1:1, like a minifier that takes a .js and .js.map file and outputs a .js and .js.map file?

Not at the moment. I don't know yet how to implement this and still have the API look beautiful. We also have to make sure that caching works correctly, as we have to invalidate if either the .js or the .js.map file changes. My plan is to write a source-map-aware uglifier plugin to understand this use case better, and then extract common code back into this Filter base class.