Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
322 stars 201 forks source link

add code to upgrade all upgradable vats during agoric-upgrade-12 #8219

Open warner opened 1 year ago

warner commented 1 year ago

What is the Problem Being Solved?

Ideally, at any given time, there is a branch in agoric-sdk which is as close as possible to the code running on the chain. We can't keep this up forever (eventually we'll be deploying code to the chain that didn't come from agoric-sdk), and we can't do it perfectly (not all vats are upgradable, so changes we make to e.g. the core-bootstrap mechanism in packages/vats/ cannot be deployed to the chain), and we can't represent non-vat-deployment state changes in our source tree. But the closer we can make those two environments, the more relevant unit tests and CI results against that branch will be.

Since master is where all our main work gets landed, we'd like to minimize the difference between the services available on mainnet, and the code present on master.

To that end, when we perform the agoric-upgrade-12 event, it would be great to upgrade all the vats that we possibly can, to whatever code is currently present on master.

8215 is about writing a test of this event.

Description of the Design

I'm not sure, but I know it will require the following pieces in the upgrade handler:

To avoid interleaving of vat upgrade with work from the previous block, or work from the following block, we should follow @mhofman 's recommendations:

We should probably upgrade each vat in a separate controller.run(), so all other (subscribing) vats have a chance to react to the cancelled Promises of the vat being upgraded, without interleaving with their own upgrade.

Security Considerations

Only the usual security concerns about upgrading a huge amount of code at the same time.

Each upgraded vat will cancel (reject) all its outstanding promises. We think we've designed for this, but we haven't used it "in anger" before, so we should be concerned about whether subscribing vats will e.g. resubscribe properly.

Scaling Considerations

The upgrades may take a considerable amount of time. We should find a way to exercise this and provide validator guidance, however a chain-software upgrade is the ideal time for long execution like this: all validators are restarting at their own pace anyways.

Test Plan

8215 is one part: using our docker-based deployment test to exercise the agoric-upgrade-12 handler, and then interacting with the upgraded system to see if it still works.

Upgrading lots of vats at the same time represents considerable risk: each vat's upgrade is an opportunity to discover something we missed, especially in their collective interaction. I really hope we can find a way to test this kind of upgrade against a clone of mainnet.

We should look carefully at the clone's slogfiles: look for vat termination or unexpected error messages happening during the upgrade process. Any such symptoms could mean serious breakage.

Upgrade Considerations

All of them.

### Tasks
- [ ] https://github.com/Agoric/agoric-sdk/issues/8245
dckc commented 11 months ago

We should probably upgrade each vat in a separate controller.run() ...

Does the coreProposals mechanism do that? @michaelfig ?

mhofman commented 11 months ago

https://github.com/Agoric/agoric-sdk/pull/8287 arranges for multiple actions that are run to completion within a block.

If the upgrade can be arranged for multiple core proposals, each generating a different action (todo), then it would naturally fall off of this.

michaelfig commented 11 months ago

The coreProposals mechanism doesn't impose any serial execution. The reason is that I assumed there might be interdependencies between the different proposals, in the same way there is between different behaviours.

I suspect it's most flexible (and less intrusive) to structure this using a coordinating promise space behavior to ensure sequential upgrades.

@warner, what was your reason for needing separate controller.run()s?

mhofman commented 11 months ago

@michaelfig how hard would be it to support both? A single core proposal that arranges some sequential work using internal promises, and multiple core proposals inside an upgrade plan?

mhofman commented 11 months ago

One motivation I can think of right now is to update the bridge vat first so that cosmic-swingset can get the results for subsequent core eval bridge messages.

michaelfig commented 11 months ago

A single core proposal that arranges some sequential work using internal promises, and multiple core proposals inside an upgrade plan?

Am I missing something? I thought that's exactly what we have today.

mhofman commented 11 months ago

Am I missing something? I thought that's exactly what we have today.

I thought an upgrade plan results in a single core eval. I would like a way to express an upgrade plan that results in multiple core eval. My requirement is being able to upgrade vat-bridge in a first core eval, then handle the subsequent core eval using the new vat bridge logic.

michaelfig commented 11 months ago

Ah, I was confused because you had first said "multiple core proposals inside an upgrade plan", which we do have. "multiple core evals" is a different thing, and currently we squash the core proposals into a single core eval so that they run in parallel.

What I was trying to say is that if you want to schedule a number of core proposals to run sequentially, you can do that today without any changes to the underlying mechanism. Just create a core proposal that accepts options describing other proposals to schedule in a custom way.

I would rather use the existing machinery, which is flexible enough already, instead of grafting scheduling policy into a lower level than necessary.

mhofman commented 11 months ago

The use case I have is unfortunately not compatible with parallel execution. Let's discuss.

michaelfig commented 11 months ago

The use case I have...

upgrade each vat in a separate controller.run(), so all other (subscribing) vats have a chance to react to the cancelled Promises of the vat being upgraded, without interleaving with their own upgrade

Oh, now I see. Even if you linearise the individual upgrades, you can't be sure that their side-effects have run to completion unless you put them in a separate controller.run(). Makes sense.