add code-coverage to tests

What is the Problem Being Solved?

It'd be nice to know whether our code is being exercised by our unit tests. Code coverage doesn't tell you that you're exercising everything, but it'll tell you that you're exercising nothing.

Description of the Design

I'll investigate using nyc under AVA. My main questions are:

how are the results reported
how will nyc interact with SES

It looks like nyc works by using istanbul to patch require to send the code through a babel-based transformation function. This transform works a bit like our metering transform: it injects new code (in this case, at every function entry, statement, and branch alternative). A function in kernel.js that starts like this:

function abbreviateReviver(_, arg) {
  if (typeof arg === 'string' && arg.length >= 40) {
    // truncate long strings
    return `${arg.slice(0, 15)}...${arg.slice(arg.length - 15)}`;
  }
  return arg;
}

is turned into:

function abbreviateReviver(_, arg) {
  cov_1oweoga0a3().f[0]++;
  cov_1oweoga0a3().s[0]++;
  if (
    (cov_1oweoga0a3().b[1][0]++, typeof arg === 'string') &&
    (cov_1oweoga0a3().b[1][1]++, arg.length >= 40)
  ) {
    cov_1oweoga0a3().b[0][0]++;
    cov_1oweoga0a3().s[1]++; // truncate long strings
    return `${arg.slice(0, 15)}...${arg.slice(arg.length - 15)}`;
  } else {
    cov_1oweoga0a3().b[0][1]++;
  }
  cov_1oweoga0a3().s[2]++;
  return arg;
}

where cov_1oweoga0a3() is defined at the top of the file, and contains this:

function cov_1oweoga0a3() {
  var path =
    '/home/warner/stuff/agoric/agoric-sdk/packages/SwingSet/src/kernel/kernel.js';
  var hash = 'f9da108f62513efbdb58c1204522141a5b771a76';
  var global = new Function('return this')();
  var gcv = '__coverage__';
  var coverageData = {
    path:
    ... lots of stuff ...
    },
    _coverageSchema: '1a1c01bbd47fc00a2c39e90264f33305004495a9',
    hash: 'f9da108f62513efbdb58c1204522141a5b771a76',
  };
  var coverage = global[gcv] || (global[gcv] = {});
  if (!coverage[path] || coverage[path].hash !== hash) {
    coverage[path] = coverageData;
  }
  var actualCoverage = coverage[path];
  {
    // @ts-ignore
    cov_1oweoga0a3 = function() {
      return actualCoverage;
    };
  }
  return actualCoverage;
}

Part of that function creates a gigantic array of zeros, whose elements can be incremented at each function/statement/branch. The bit at the end is the interesting part: it relies upon a mutable object named coverage to which it can attach all the data for this one file. It finds that object by looking it up on the global. It finds the global by creating a new Function and making it return its this value.

Under SES, this runs into all sorts of problems:

this inside a bare Function is null, to prohibit code in one Compartment from using a function object from some other Compartment to breach confinement. The attack would climb the function object's prototype/.constructor chain to access the Function constructor and then use it as an indirect eval from a different Compartment (with different globals).
To prevent ambient communication channels, each time we create a new Compartment, we do harden(c.globalThis) before loading any code into it. So even if the injected coverage code could reach global[anything], it would be frozen, and it wouldn't be able to modify the statement counters.
This transformation is applied at require time, which (since we're running under -r esm) means it happens at import time. Most of the code we want to examine is bundled using bundleSource, and then evaluated using importBundle (which means c.evaluate() in a new Compartment). This bypasses the transformation.

So we need some new techniques. And I really don't want to try to rewrite istanbul to pull it off. Here's what we need:

1: a way to pass a mutable coverage object to the injected code which survives what we choose to do with SES
2: a way to apply the coverage transform to code loaded into child Compartments

We could try to maintain the ocap safety rules (no ambient communication channels), and call the result "ocap-safe coverage instrumentation". Or we could not bother, because these are just unit tests.

If we abandon safety, we could do one or more of:

add a regular property to endowments, but not harden c.globalThis so it stays mutable
add a Map to the endowments, since harden only freezes the API surface and not the (non-property-based) mutable data of a Map

I see some options in istanbul which influence the generation of the code that acquires the global object, as well as the coverage[gcv] retrieval code. I don't yet know how to set these options, or whether they can be set from the nyc command line.

To maintain safety, we'd need to ensure that the injected code can see the coverage counters, but the pre-modified code cannot. This happens to be the same requirement we have for the metering transform, and we achieve it by making the meter objects visible as globalLexicals, rather than endowments (which are added to the global), and parsing/scanning the pre-modified code to prohibit any mention of the getGlobalMeter name by which the meter object is retrieved by the injected code.

The metering feature has a similar requirement that all child Compartments receive the metering transform. We achieve that by creating an "inescapable transform" utility, which modifies Compartment to impose a set of transforms and globalLexicals on any child Compartments it makes (adding them to the "germ line" of the Compartment).

If we could expose the istanbul coverage-injection process as a single string-to-string transformation function, we might be able to use the same technique. This will depend upon how istanbul is written: can we extract this function in a sensible way?

One blocker that I don't have an answer for yet is how the injected code should be told what filename to use (the path variable in the injected code). We can get some relative pathnames from the source bundle (we use this for sourceMap and debugging information), but I don't know that we can (or should be able to) get the full pathname.

Likewise, there are properties named _coverageSchema and hash which I don't know how to generate.

So the vague plan so far is:

convert istanbul's transform into a simple string-to-string function, suitable for Compartment's transforms option
wrap this in code which scans the original source to reject any appearance of getCoverage
find a way to tell nyc to tell istanbul to set the options to change two aspects of the injected code:
- instead of const global = new Function('return this')(); and const coverage = global[gcv], use const coverage = getCoverage()
add a swingset runtimeOption to accept a list of transforms and global lexicals
use those transforms/lexicals as inescapableTransforms/inescapableGlobalLexicals on the kernel's Compartment
- so they'll be transmitted to all vat Compartments too
figure out the filename/hash/_coverageSchema question
make sure whatever we do to the transformation also works for the non-bundled outer-level sources
figure out how nyc retrieves the coverage[path] data afterwards, and make sure our thing is similarly accessible

A bit more investigating:

istanbul-lib-instrument uses a template to define the coverage function text which gets injected:

https://github.com/istanbuljs/istanbuljs/blob/1b52fe750d1f800c34dbff168614c0c73bd76026/packages/istanbul-lib-instrument/src/visitor.js#L537-L560

There are a handful of options to influence how the template gets rendered. If opts.coverageGlobalScopeFunc = false, then GLOBAL_COVERAGE_TEMPLATE will be var global = GLOBAL_COVERAGE_SCOPE;, which gets its value from opts.coverageGlobalScope, which defaults to producing var global = this;. I think we could change this to some other name, and put that name into the global lexicals, pointing at an object with a __coverage__ property that points to a mutable object to contain the data.

But it sounds like this option isn't exposed in nyc yet, partially because it was added to support browser-side use cases, like multiple iframes, and it didn't seem necessary for Node.js development. Some discussion is here: https://github.com/istanbuljs/istanbuljs/issues/199

My next task is to figure out how hard it would be to control this from an nyc invocation, and to track down how nyc reads back out the coverage data at the end (if we move the generator to put it somewhere other than globalThis.__coverage__, we need the reader/reporter to look in the right place).

Next ideas:

@erights says there's no way to provide an object (on the global) which claims to be frozen, but still allows writes to arbitrary properties like the way Istanbul's instrumentation expects, even if it's a Proxy. We could attach an object with a set() accessor that can be written to, but only if we know the property names ahead of time, and Istanbul will be using a separate property for each filename, which doesn't sound easy to enumerate properly first (but maybe if we get desperate we could find a way).
@erights also says import-bundle doesn't necessarily have to harden(c.globalThis) on the new Compartment's global. Instead it could merely do Object.freeze(c.globalThis), and whoever calls importBundle(bundle, { endowments }) is responsible for making sure the endowments they provide are themselves hardened. Then we thread a flag through the nested Compartment process (controller creates kernel compartment, kernel creates vat compartments, zcf dynamic vat creates contract compartment) to deliver a __coverage__ global that is not frozen. This is a deliberate ambient communication channel, so it must of course only be used during tests, and never in production. Finding a way to cleanly add this while making it easy to prove (to ourselves/auditors) that it cannot be active in production may be a challenge.
we could have importBundle not even freeze the global, under control of an option/switch that uses a unique object exported by the module next to the importBundle function itself. This would limit the ability to disable global-freezing to someone who imported the same module instance, and would let us perhaps share the importBundle function to code which shouldn't be able to disable freezing.

This ticket description is stale because we've dropped nyc and use https://github.com/bcoe/c8#readme

Current state of code coverage is https://agoric-sdk-coverage.netlify.app which isn't working fully.

For prioritization, we have had bugs in user testing that would have been caught sooner by a unit test and code coverage tooling that revealed the lack thereof.

@michaelfig adds,

I worked on this early on when we were adopting c8. It seemed that the only way forward was to have the bundling process take options to spit out a bunch of files (individual archive components and source maps for each of those components) in a well-known directory on the local computer. That option would attach both the sourceURL and sourceMappingURL to the code stored in the bundle.

Downsides:

c8 only works on Node, so code running in XS is not profiled

the bundle becomes nondeterministic (probably needs to have absolute paths hardcoded in it)

trouble making sure this is all propagated through eval (i.e. compartment.evaluate(code)) without c8 thinking that the evaled code was different from the code used via a Node import

Agoric / agoric-sdk

add code-coverage to tests #1664