Open warner opened 4 years ago
@warner and I made a little progress in this direction yesterday: we took slogfile data from a busy block with about 7 messages from the load generator that result in dozens of deliveries to several vats and made a diagram of it:
https://github.com/Agoric/agoric-sdk/issues/3459#issuecomment-892286574
(design notes on the debugging tool I want to build, tentatively named
ralog
for "retro-active log", which is the most horrible-yet-relevant name I could come up with on the spot)Any long-running swingset should record enough information to allow an external process to regenerate a single vat's state in a different runtime environment, such as one with a debugger. The chain swingset will run under a specialized deterministic JS engine (XS) within a consensus platform (cosmos-sdk), but developers should be able to reload a vat into a local Node.js environment, with their favorite debugger pointed at it.
The swingset config object should include a few new options:
ralog-file
: a file to which debug events should be written. Maybe sqlite.ralog-port
: controls an HTTP server from which clients can request copies of the ralog data, either as one-off GETs, or as a websocket stream of updates. For access control, we either use something like the swissnums in Foolscap'slogport
(see "Remote Access" here), or say "meh, it's a public chain, the logs are public too", and remind validators to not expose this port to the world to avoid a DoS vector. For the production chain, I'd expect some block-explorer -like service to run a follower node, and record log data from that, making it available through some sort of cache, rather than exposing their follower node directly.We want to log an event each time a delivery is made to a vat, whether it was a
send
(a message send, triggeringdispatch.deliver
) or anotify
(a promise resolution, triggeringdispatch.notifyFulfillToData
or one of its brethren).During the execution of that crank, we also want to log any syscalls the vat makes. For retroactive debugging, we need the return value of the device-read syscalls. The rest of the syscall data is informative but not needed to regenerate a vat within the debugger.
We also want to log any user-invoked debug logs (
console.log
,console.debug
, etc). This is the primary way through which user code will reveal its internal workings. Theconsole
object provided to each vat is write-only: the implementation seals the argument data and convey it into the debug log without revealing it to any other vats (or nested Compartments, maybe).The log events should include:
['send', target, method, args, result]
or['notify', promiseid, resolution]
The log needs to include the complete source bundle for each vat (both static and dynamic).
Log Display
The first console should show a big list of delivery events. The columns will be like "crank number", "vat id", "delivery type", "delivery args".
One sidebar should show a list of vat-ids in the trace, and allow the user to assign petnames to them. Static vats should be pre-named, but dynamic vats may be less obvious (especially when multiple vats are made from the same bundle, e.g. ZCF). The
createDynamicVat
call should take adescription
which should be used to pre-populate the debug log's name, but the user at the log-display app should be able to override that.This list of vat-ids should have checkboxes to limit the event display to ones being delivered to the selected vats, with both toggle-one and deselect-all-others affordances.
Another sidebar/popup should let the user assign petnames to
koNN
andkpNN
object/promise IDs, again possibly with in-vat support for adding descriptions to specific Presences/Promises. E.g. the mint that creates a Purse might tell liveslots (perhaps throughRemoteable()
) that this is a Purse, and point to the Issuer and its name. The debug log could record this and retrieve it later, even though userspace code could not. On the receiving side, a ZCF wrapper might tellconsole
or liveslots that it thinks about the received object as "stablecoin escrow purse", adding to the information contained in the debug log. Then the debugging user's petname table would say "ko34 is described as X by vat4 (vat4 description) and as Y by vat7 (vat7 description)".This list can be displayed in pages, or by fetching a minimal set of data from the source. The details can be fetched lazily as the user drills down.
A popup on each row should offer a way to spin up a local copy of the vat at that state. Invoking it should retrieve the necessary source code, checkpoint, and/or transcripts from the event repository. Then it should launch a Node.js program that knows how to accept the same, and replay the vat forwards to the state just before the selected message is delivered. It should then execute a
debugger
statement in liveslots, at the point just before the vat's object method is invoked. If the crank completes and the user hasn't terminated the process yet, the program should be fed the next historical delivery and stop again just before it is executed. It might be useful to fork the Node.js process before the message is delivered (and hold the forked process as a backup copy of the live vat state, using morefork()
s to replicate it), to offer a way to efficiently re-run the message multiple times without replaying the entire transcript each time.