Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
326 stars 206 forks source link

can/should zoe expose vat-termination on the instance admin facet? #9716

Open warner opened 2 months ago

warner commented 2 months ago

What is the Problem Being Solved?

To perform #9483 , we need a want to terminate about ten soon-to-be-unused vats (eg v29-price_feed-ATOM and v46-scaledPriceAuthority-ATOM). We can think of three approaches:

This ticket is about the third option. (The first is awkward because of how cosmic-swingset works, the second is awkward because the "upgrade" code must also re-defined all the durable Kinds or else the upgrade won't work long enough to reach the self-termination).

Description of the Design

TBD, but currently, when someone asks Zoe to instantiate a previously-installed contract, Zoe creates a new contract vat for the instance, starts ZCF, tells ZCF to start the contract, and then Zoe returns some sort of "zoe instance admin facet" to the caller. This facet currently has an upgrade() method. The plan would be to add a terminate() method as well.

Internally, Zoe holds on to the vat-admin service's adminNode (one of the values returned by E(vatAdminSvc).createVat()), so Zoe has the ability to call E(adminNode).terminate(), but Zoe does not currently have any pathway to do that. So we'd add that pathway, exposed as a terminate() on the instance admin facet.

The core-evals which create the price feeds are storing these instance admin facets in the vat-bootstrap promise space. So once the admin facets have a .terminate() method, a later core-eval could pull the facet from promise space and then invoke terminate() on them.

Security Considerations

Nominally, this is not an increase in the authority of the instance-admin-facet holders: they already have the ability to upgrade the vat, which includes (as an annoying-to-use subset) the ability to upgrade to a version which just self-terminates. So from that argument, creating a fast-path-to-termination isn't anything new.

But, making termination a lot easier is a security concern in its own right. We could imagine audit procedures or linting rules that scans proposed contract upgrades for calls to vatPowers.exitVat(), or for leaks of the vatPowers object, and this external-to-the-vat termination pathway would completely bypass those scans.

Scaling Considerations

We should not exercise this new power until the #8928 slow-vat deletion code (and all necessary cosmic-swingset integration code) is in place, otherwise the sudden all-at-once deletion of a large price-feed vat could crash the chain.

Once that is in place, all vat terminations become slow vat deletions. The slow-deletion scheduler will limit the work done during any single block, so even if multiple vats are terminated at the same time, we'll still delete just one at a time. Nevertheless, we should start small, by deleting the smallest stale vat (probably v112) and watching it carefully, so that we'll have time to adjust rates or make changes to the process before we initiate termination of the next one. Each core-eval that triggers this will require a stake-holder vote, but the core-eval could delete one or all of them, so the community needs to keep the scaling concerns in mind when evaluating those proposals.

Test Plan

This should be testable by packages/zoe/ unit tests.

In a3p, once slow-deletion is in place, and once we've upgraded/replaced some price-feed vats, we should have an a3p core-eval step to delete an old vat. That would serve as a better end-to-end integration test, before we exercise this on mainnet. Note that slow-deletion means the a3p chain will keep doing deletion work even after the core-eval completes, which might challenge our assumptions about when an a3p step should be considered "done" (and when we make a snapshot to prepare for the next step). We might need to adjust the way a3p works to wait longer, and avoid leaving lingering cleanup work that would confuse later a3p steps.

Upgrade Considerations

Deploying this will require an upgrade to vat-zoe, as well as a new method on an existing Zoe Kind/Exo.

cc @Chris-Hibbert @erights @dckc

Chris-Hibbert commented 2 months ago

The adminNode is stored in InstanceStorageManager, and is accessible inside ZCF to make it possible to terminate vats from the inside. That OCap could be passed out to external view in a refactoring--the hard question is how to pass it out to some place where it can continue to be tightly controlled. (There is no creatorFacet for Zoe.)

I think the best answer is to add another method to the configFacet. This means that anything that has the ability to updateZcfBundleId() can also do this. I think that's acceptable, as both should only be called from bootstrap.