Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
327 stars 206 forks source link

retroactive debugging #1359

Open warner opened 4 years ago

warner commented 4 years ago

(design notes on the debugging tool I want to build, tentatively named ralog for "retro-active log", which is the most horrible-yet-relevant name I could come up with on the spot)

Any long-running swingset should record enough information to allow an external process to regenerate a single vat's state in a different runtime environment, such as one with a debugger. The chain swingset will run under a specialized deterministic JS engine (XS) within a consensus platform (cosmos-sdk), but developers should be able to reload a vat into a local Node.js environment, with their favorite debugger pointed at it.

The swingset config object should include a few new options:

We want to log an event each time a delivery is made to a vat, whether it was a send (a message send, triggering dispatch.deliver) or a notify (a promise resolution, triggering dispatch.notifyFulfillToData or one of its brethren).

During the execution of that crank, we also want to log any syscalls the vat makes. For retroactive debugging, we need the return value of the device-read syscalls. The rest of the syscall data is informative but not needed to regenerate a vat within the debugger.

We also want to log any user-invoked debug logs (console.log, console.debug, etc). This is the primary way through which user code will reveal its internal workings. The console object provided to each vat is write-only: the implementation seals the argument data and convey it into the debug log without revealing it to any other vats (or nested Compartments, maybe).

The log events should include:

The log needs to include the complete source bundle for each vat (both static and dynamic).

Log Display

The first console should show a big list of delivery events. The columns will be like "crank number", "vat id", "delivery type", "delivery args".

One sidebar should show a list of vat-ids in the trace, and allow the user to assign petnames to them. Static vats should be pre-named, but dynamic vats may be less obvious (especially when multiple vats are made from the same bundle, e.g. ZCF). The createDynamicVat call should take a description which should be used to pre-populate the debug log's name, but the user at the log-display app should be able to override that.

This list of vat-ids should have checkboxes to limit the event display to ones being delivered to the selected vats, with both toggle-one and deselect-all-others affordances.

Another sidebar/popup should let the user assign petnames to koNN and kpNN object/promise IDs, again possibly with in-vat support for adding descriptions to specific Presences/Promises. E.g. the mint that creates a Purse might tell liveslots (perhaps through Remoteable()) that this is a Purse, and point to the Issuer and its name. The debug log could record this and retrieve it later, even though userspace code could not. On the receiving side, a ZCF wrapper might tell console or liveslots that it thinks about the received object as "stablecoin escrow purse", adding to the information contained in the debug log. Then the debugging user's petname table would say "ko34 is described as X by vat4 (vat4 description) and as Y by vat7 (vat7 description)".

This list can be displayed in pages, or by fetching a minimal set of data from the source. The details can be fetched lazily as the user drills down.

A popup on each row should offer a way to spin up a local copy of the vat at that state. Invoking it should retrieve the necessary source code, checkpoint, and/or transcripts from the event repository. Then it should launch a Node.js program that knows how to accept the same, and replay the vat forwards to the state just before the selected message is delivered. It should then execute a debugger statement in liveslots, at the point just before the vat's object method is invoked. If the crank completes and the user hasn't terminated the process yet, the program should be fed the next historical delivery and stop again just before it is executed. It might be useful to fork the Node.js process before the message is delivered (and hold the forked process as a backup copy of the live vat state, using more fork()s to replicate it), to offer a way to efficiently re-run the message multiple times without replaying the entire transcript each time.

dckc commented 3 years ago

@warner and I made a little progress in this direction yesterday: we took slogfile data from a busy block with about 7 messages from the load generator that result in dozens of deliveries to several vats and made a diagram of it:

https://github.com/Agoric/agoric-sdk/issues/3459#issuecomment-892286574