make sure buildRootObject() can't retain agency

warner commented 3 years ago

What is the Problem Being Solved?

While writing up documentation for the new "run policy" object (#3460), it occurred to me to consider whether our vat creation code is doing the same kind of setImmediate that liveslots does for each crank. If not, code inside buildRootObject() (or at the top level of the source bundle) could use Promise.resolve().then(regainControl). This would let it retain agency even though the kernel thinks the loadBundle was complete.

This probably couldn't work on an xs-worker vat, because the kernel-to-worker message cycle isn't complete until the worker's promise queue is empty. But it might interfere with what the supervisor thinks is going on.

On a local (Node.js) vat, this might let the vat perform syscalls when it's supposed to be idle. In the worse case, these syscalls could interleave in unpredictable ways with syscalls of the active vat, causing consensus failures and general chaos.

Description of the Design

The fix will be for liveslots to add a waitUntilQuiescent() after the call to buildRootObject(). In addition, the worker supervisor (which is different for each kind of worker) should do a waitUntilQuiescent() after finishing the importBundle() that evaluates the top-level forms.

Security Considerations

This could allow sneaky vats to break consensus.

We do not currently meter either the top-level forms, nor buildRootObject(). Vat code which performs an infinite promise-queue loop will starve the kernel until we implement such metering.

Test Plan

A unit test should create a vat with a finite promise-queue loop (perhaps 100 cycles) in three places:

the top-level form
buildRootObject()
some method (which we'll invoke on the vat shortly after creation)

Each loop should increment a counter and check to see if the counter was incremented by someone else, or append a distinctive string to an array and check for interleaving.

The top-level form should observe all of its cycles complete without interference, followed by buildRootObject()'s loop, followed by the method invocation. If there is any overlap, this indicates that the vat did not properly lose agency, and is interfering with itself, which means it could also interfere with the kernel (if it executed syscalls during its loops).

katelynsills commented 3 years ago

Just wanted to do a check of the threat model. Currently, no user-provided code has access to buildRootObject, right? So this particular concern would only arise when a static vat that Agoric wrote misbehaves, or if ZCF is so flawed that a user's contract code can somehow escape. Is that right?

warner commented 3 years ago

That's correct. It's part of the swingset threat model, but not the agoric chain threat model (even after we move to permissionless contract installs).

mhofman commented 2 years ago

The reasons have shifted since the issue was filed, but the goal remains: buildRootObject() should complete in a single crank, or be considered a failure.

To support upgrades (see https://github.com/Agoric/agoric-sdk/issues/4382), buildRootObject() or a derivative will like be called as part of vat creation, and be in turn responsible to also call into the contract's initialization. While these operations will be async to not force undue constraint on user code, the whole initialization process should complete in a single crank (aka not be dependent on external results).

The idea is to have a test in liveSlots that the promise returned by buildRootObject() has settled by the end of the crank. Infinite async recursion or other lopp should be handled by a simple meter limit.

mhofman commented 1 year ago

@warner is this still an issue?

Edit: Oh right I think the issue is with upgrades and ZCF contracts that may do work that doesn't fit in a single crank.

Agoric / agoric-sdk