Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
326 stars 206 forks source link

transactional turns #48

Closed warner closed 3 years ago

warner commented 5 years ago

In the present code, if a Vat does something against the rules (e.g. makes a syscall to an imported object reference that doesn't exist in the c-list), the kernel will throw an exception, which the Vat might choose to catch. If the exception makes it back out from dispatch.deliver(), the kernel will log it, and continue onwards.

Instead, illegal Vat operations should probably kill the Vat. It should not be possible to trigger this from the high-level "vat code" that liveSlots provides (certainly ocap operations should not be able to kill the vat), but might be caused by bugs in the liveSlots layer itself, or in non-ocap vats in which the user-provided code is allowed direct access to syscalls.

The kernel should set a per-vat flag when it detects an error, and when it gets control back again (after dispatch.deliver() completes), it should shut down the vat and delete all references to it. For efficiency, it should probably ignore any syscalls that happen after the flag is set but before deliver() completes.

Moreover, all the syscalls that happened before the error should be nullified. I'm not yet sure how to do this easily, but the general idea is to queue all the syscalls made while dispatch.deliver() executes, get control back from the vat (so we know no new syscalls will be made this turn), then examine the queue for validity. If everything in the queue can be executed without a vat-killing error, we execute them, and allow their state changes to be committed.

This will be easier when syscalls no longer return values. The work in Agoric/SwingSet#88 will achieve this: syscall.send() will accept a result parameter instead of returning one, and syscall.createPromise() will be removed entirely.

It will probably also depend upon the swingset side of https://github.com/Agoric/cosmic-swingset/issues/65 . We may provide the kernel with a state-management object that has database-style transactions built-in. We'll do startTransaction() just before executing the syscalls, and commitTransaction() just after, unless an exception occurs in which case we'll do abortTransaction() instead. This will help because a common pattern will be two syscall.send()s in a row, where the second one targets the result of the first. The first one modifies the PromiseTable (to add the result Promise), the second needs to observe that change.

Of course there's a higher-level question of what exact state does change when a turn is aborted. We've got deterministic execution, so we don't want to immediately re-deliver the same message, because it will just fail again in the same way (unless it was an out-of-gas failure, in which case the Keeper should have the option of redelivery). Killing the Vat is a state change, so the boundaries of the transaction should be drawn around the one vat's syscalls, and not around the kernel decision making that surrounds it.

Non-vat-killing exceptions might happen too. We may want to provide a programming model like Ethereum, in which exceptions that propagate out of a contract cause all state to be reverted (except for the one where all gas is forfeited). In particular all outbound messages may be squelched. We need to decide what will be most useful for vat-code authors.

erights commented 5 years ago

Good questions. One immediate answer:

all state to be reverted [...]. In particular all outbound messages may be squelched

Any time state is reverted, all outbound message that were emitted from the aborted computation must be squelched. Otherwise we have hangover inconsistency.

(Btw, "squelch" is a cool term to use for this or something like it.)

warner commented 3 years ago

@FUDCo and I mostly implemented this last year. The kernel.js code around here:

https://github.com/Agoric/agoric-sdk/blob/78e5390495efcdba1ea7a91a0fc29ed4f65903dc/packages/SwingSet/src/kernel/kernel.js#L562-L582

will notice if a delivery failed, and unwind any state changes it made (by abandoning the "crank buffer") instead of committing them.

I don't believe it ignores syscalls that happen after the decision to terminate the vat has been made: they happen anyway, but their side-effects are abandoned along with those of earlier syscalls. I don't think there's a significant performance improvement to be had by avoiding that.

So I think we can close this now.