retroactive debugging - Githubissues

(design notes on the debugging tool I want to build, tentatively named ralog for "retro-active log", which is the most horrible-yet-relevant name I could come up with on the spot)

Any long-running swingset should record enough information to allow an external process to regenerate a single vat's state in a different runtime environment, such as one with a debugger. The chain swingset will run under a specialized deterministic JS engine (XS) within a consensus platform (cosmos-sdk), but developers should be able to reload a vat into a local Node.js environment, with their favorite debugger pointed at it.

The swingset config object should include a few new options:

ralog-file: a file to which debug events should be written. Maybe sqlite.
ralog-port: controls an HTTP server from which clients can request copies of the ralog data, either as one-off GETs, or as a websocket stream of updates. For access control, we either use something like the swissnums in Foolscap's logport (see "Remote Access" here), or say "meh, it's a public chain, the logs are public too", and remind validators to not expose this port to the world to avoid a DoS vector. For the production chain, I'd expect some block-explorer -like service to run a follower node, and record log data from that, making it available through some sort of cache, rather than exposing their follower node directly.

We want to log an event each time a delivery is made to a vat, whether it was a send (a message send, triggering dispatch.deliver) or a notify (a promise resolution, triggering dispatch.notifyFulfillToData or one of its brethren).

During the execution of that crank, we also want to log any syscalls the vat makes. For retroactive debugging, we need the return value of the device-read syscalls. The rest of the syscall data is informative but not needed to regenerate a vat within the debugger.

We also want to log any user-invoked debug logs (console.log, console.debug, etc). This is the primary way through which user code will reveal its internal workings. The console object provided to each vat is write-only: the implementation seals the argument data and convey it into the debug log without revealing it to any other vats (or nested Compartments, maybe).

The log events should include:

incarnation ID: each time the node restarts, it gets a new incarnation ID (with one sequential part, and one random part). Log events are supposed to be the same between different incarnations, but since log events are not part of the consensus state, nondeterministic logs won't halt the chain. Recording all log events, even the ones which are supposed to be an exact copy of some previous incarnation, will give us a better chance of tracking down where the vat diverged from history and thus help us figure out the nondeterminism at fault.
crank number, a big integer that spans the entire swingset
vat id to which the delivery was being made
delivery details: ['send', target, method, args, result] or ['notify', promiseid, resolution]
a per-vat sequential delivery counter
the results of the delivery: success, vat fault (bad c-list access), metering termination

The log needs to include the complete source bundle for each vat (both static and dynamic).

Log Display

The first console should show a big list of delivery events. The columns will be like "crank number", "vat id", "delivery type", "delivery args".

One sidebar should show a list of vat-ids in the trace, and allow the user to assign petnames to them. Static vats should be pre-named, but dynamic vats may be less obvious (especially when multiple vats are made from the same bundle, e.g. ZCF). The createDynamicVat call should take a description which should be used to pre-populate the debug log's name, but the user at the log-display app should be able to override that.

This list of vat-ids should have checkboxes to limit the event display to ones being delivered to the selected vats, with both toggle-one and deselect-all-others affordances.

Another sidebar/popup should let the user assign petnames to koNN and kpNN object/promise IDs, again possibly with in-vat support for adding descriptions to specific Presences/Promises. E.g. the mint that creates a Purse might tell liveslots (perhaps through Remoteable()) that this is a Purse, and point to the Issuer and its name. The debug log could record this and retrieve it later, even though userspace code could not. On the receiving side, a ZCF wrapper might tell console or liveslots that it thinks about the received object as "stablecoin escrow purse", adding to the information contained in the debug log. Then the debugging user's petname table would say "ko34 is described as X by vat4 (vat4 description) and as Y by vat7 (vat7 description)".

This list can be displayed in pages, or by fetching a minimal set of data from the source. The details can be fetched lazily as the user drills down.

A popup on each row should offer a way to spin up a local copy of the vat at that state. Invoking it should retrieve the necessary source code, checkpoint, and/or transcripts from the event repository. Then it should launch a Node.js program that knows how to accept the same, and replay the vat forwards to the state just before the selected message is delivered. It should then execute a debugger statement in liveslots, at the point just before the vat's object method is invoked. If the crank completes and the user hasn't terminated the process yet, the program should be fed the next historical delivery and stop again just before it is executed. It might be useful to fork the Node.js process before the message is delivered (and hold the forked process as a backup copy of the live vat state, using more fork()s to replicate it), to offer a way to efficiently re-run the message multiple times without replaying the entire transcript each time.

Agoric / agoric-sdk

retroactive debugging #1359

Log Display