Open warner opened 4 years ago
It looks like nyc
works by using istanbul
to patch require
to send the code through a babel-based transformation function. This transform works a bit like our metering transform: it injects new code (in this case, at every function entry, statement, and branch alternative). A function in kernel.js
that starts like this:
function abbreviateReviver(_, arg) {
if (typeof arg === 'string' && arg.length >= 40) {
// truncate long strings
return `${arg.slice(0, 15)}...${arg.slice(arg.length - 15)}`;
}
return arg;
}
is turned into:
function abbreviateReviver(_, arg) {
cov_1oweoga0a3().f[0]++;
cov_1oweoga0a3().s[0]++;
if (
(cov_1oweoga0a3().b[1][0]++, typeof arg === 'string') &&
(cov_1oweoga0a3().b[1][1]++, arg.length >= 40)
) {
cov_1oweoga0a3().b[0][0]++;
cov_1oweoga0a3().s[1]++; // truncate long strings
return `${arg.slice(0, 15)}...${arg.slice(arg.length - 15)}`;
} else {
cov_1oweoga0a3().b[0][1]++;
}
cov_1oweoga0a3().s[2]++;
return arg;
}
where cov_1oweoga0a3()
is defined at the top of the file, and contains this:
function cov_1oweoga0a3() {
var path =
'/home/warner/stuff/agoric/agoric-sdk/packages/SwingSet/src/kernel/kernel.js';
var hash = 'f9da108f62513efbdb58c1204522141a5b771a76';
var global = new Function('return this')();
var gcv = '__coverage__';
var coverageData = {
path:
... lots of stuff ...
},
_coverageSchema: '1a1c01bbd47fc00a2c39e90264f33305004495a9',
hash: 'f9da108f62513efbdb58c1204522141a5b771a76',
};
var coverage = global[gcv] || (global[gcv] = {});
if (!coverage[path] || coverage[path].hash !== hash) {
coverage[path] = coverageData;
}
var actualCoverage = coverage[path];
{
// @ts-ignore
cov_1oweoga0a3 = function() {
return actualCoverage;
};
}
return actualCoverage;
}
Part of that function creates a gigantic array of zeros, whose elements can be incremented at each function/statement/branch. The bit at the end is the interesting part: it relies upon a mutable object named coverage
to which it can attach all the data for this one file. It finds that object by looking it up on the global. It finds the global by creating a new Function
and making it return its this
value.
Under SES, this runs into all sorts of problems:
this
inside a bare Function
is null
, to prohibit code in one Compartment from using a function object from some other Compartment to breach confinement. The attack would climb the function object's prototype/.constructor
chain to access the Function
constructor and then use it as an indirect eval
from a different Compartment (with different globals).harden(c.globalThis)
before loading any code into it. So even if the injected coverage code could reach global[anything]
, it would be frozen, and it wouldn't be able to modify the statement counters.require
time, which (since we're running under -r esm
) means it happens at import
time. Most of the code we want to examine is bundled using bundleSource
, and then evaluated using importBundle
(which means c.evaluate()
in a new Compartment). This bypasses the transformation.So we need some new techniques. And I really don't want to try to rewrite istanbul
to pull it off. Here's what we need:
We could try to maintain the ocap safety rules (no ambient communication channels), and call the result "ocap-safe coverage instrumentation". Or we could not bother, because these are just unit tests.
If we abandon safety, we could do one or more of:
c.globalThis
so it stays mutableMap
to the endowments, since harden
only freezes the API surface and not the (non-property-based) mutable data of a Map
I see some options in istanbul
which influence the generation of the code that acquires the global object, as well as the coverage[gcv]
retrieval code. I don't yet know how to set these options, or whether they can be set from the nyc
command line.
To maintain safety, we'd need to ensure that the injected code can see the coverage counters, but the pre-modified code cannot. This happens to be the same requirement we have for the metering transform, and we achieve it by making the meter objects visible as globalLexicals
, rather than endowments (which are added to the global), and parsing/scanning the pre-modified code to prohibit any mention of the getGlobalMeter
name by which the meter object is retrieved by the injected code.
The metering feature has a similar requirement that all child Compartments receive the metering transform. We achieve that by creating an "inescapable transform" utility, which modifies Compartment
to impose a set of transforms and globalLexicals
on any child Compartments it makes (adding them to the "germ line" of the Compartment).
If we could expose the istanbul
coverage-injection process as a single string-to-string transformation function, we might be able to use the same technique. This will depend upon how istanbul
is written: can we extract this function in a sensible way?
One blocker that I don't have an answer for yet is how the injected code should be told what filename to use (the path
variable in the injected code). We can get some relative pathnames from the source bundle (we use this for sourceMap
and debugging information), but I don't know that we can (or should be able to) get the full pathname.
Likewise, there are properties named _coverageSchema
and hash
which I don't know how to generate.
So the vague plan so far is:
istanbul
's transform into a simple string-to-string function, suitable for Compartment
's transforms
optiongetCoverage
nyc
to tell istanbul
to set the options to change two aspects of the injected code:
const global = new Function('return this')();
and const coverage = global[gcv]
, use const coverage = getCoverage()
runtimeOption
to accept a list of transforms and global lexicalsinescapableTransforms
/inescapableGlobalLexicals
on the kernel's Compartment
_coverageSchema
questioncoverage[path]
data afterwards, and make sure our thing is similarly accessibleA bit more investigating:
istanbul-lib-instrument
uses a template to define the coverage function text which gets injected:
There are a handful of options to influence how the template gets rendered. If opts.coverageGlobalScopeFunc = false
, then GLOBAL_COVERAGE_TEMPLATE
will be var global = GLOBAL_COVERAGE_SCOPE;
, which gets its value from opts.coverageGlobalScope
, which defaults to producing var global = this;
. I think we could change this to some other name, and put that name into the global lexicals, pointing at an object with a __coverage__
property that points to a mutable object to contain the data.
But it sounds like this option isn't exposed in nyc
yet, partially because it was added to support browser-side use cases, like multiple iframes, and it didn't seem necessary for Node.js development. Some discussion is here: https://github.com/istanbuljs/istanbuljs/issues/199
My next task is to figure out how hard it would be to control this from an nyc
invocation, and to track down how nyc reads back out the coverage data at the end (if we move the generator to put it somewhere other than globalThis.__coverage__
, we need the reader/reporter to look in the right place).
Next ideas:
set()
accessor that can be written to, but only if we know the property names ahead of time, and Istanbul will be using a separate property for each filename, which doesn't sound easy to enumerate properly first (but maybe if we get desperate we could find a way).harden(c.globalThis)
on the new Compartment's global. Instead it could merely do Object.freeze(c.globalThis)
, and whoever calls importBundle(bundle, { endowments })
is responsible for making sure the endowments they provide are themselves hardened. Then we thread a flag through the nested Compartment process (controller creates kernel compartment, kernel creates vat compartments, zcf dynamic vat creates contract compartment) to deliver a __coverage__
global that is not frozen. This is a deliberate ambient communication channel, so it must of course only be used during tests, and never in production. Finding a way to cleanly add this while making it easy to prove (to ourselves/auditors) that it cannot be active in production may be a challenge.importBundle
not even freeze the global, under control of an option/switch that uses a unique object exported by the module next to the importBundle
function itself. This would limit the ability to disable global-freezing to someone who imported the same module instance, and would let us perhaps share the importBundle
function to code which shouldn't be able to disable freezing.Tools to look at when we circle back to this:
This ticket description is stale because we've dropped nyc
and use https://github.com/bcoe/c8#readme
Current state of code coverage is https://agoric-sdk-coverage.netlify.app which isn't working fully.
For prioritization, we have had bugs in user testing that would have been caught sooner by a unit test and code coverage tooling that revealed the lack thereof.
@michaelfig adds,
I worked on this early on when we were adopting c8. It seemed that the only way forward was to have the bundling process take options to spit out a bunch of files (individual archive components and source maps for each of those components) in a well-known directory on the local computer. That option would attach both the sourceURL and sourceMappingURL to the code stored in the bundle.
Downsides:
- c8 only works on Node, so code running in XS is not profiled
- the bundle becomes nondeterministic (probably needs to have absolute paths hardcoded in it)
- trouble making sure this is all propagated through eval (i.e. compartment.evaluate(code)) without c8 thinking that the evaled code was different from the code used via a Node import
What is the Problem Being Solved?
It'd be nice to know whether our code is being exercised by our unit tests. Code coverage doesn't tell you that you're exercising everything, but it'll tell you that you're exercising nothing.
Description of the Design
I'll investigate using
nyc
under AVA. My main questions are:nyc
interact with SES