Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
327 stars 206 forks source link

ZCF does not enforce start completion in first incarnation #10174

Open mhofman opened 3 weeks ago

mhofman commented 3 weeks ago

Describe the bug

When implementing vat upgrades through baggage and durable kinds (#4325 and #3062), liveslots required that if buildRootObject() returned a promise, that it settled immediately (before end of crank). This was done to ensure all behavior was defined, or redefined in an upgrade without waiting on external call results. Aka a "one shot" upgrade so that pending messages to the vat didn't need to be embargoed.

When ZCF was first designed, this mechanism didn't exist, and concerns about cost of creating vats lead towards a Zygote design (#2268). Bundle caps also didn't exist or were not allowed in vatParameters (#4381)

This resulted in a design where the ZCF bundle is used to create an "empty shell" of a contract vat, and in a second delivery, the bundle of the contract is sent to that shell and the contract is started. However in an upgrade, the contract start is called as part of ZCF's buildRootObject(), which awaits the contract start.

That means a contract start is not required to complete within its crank on the first incarnation, but is required to in future incarnation, resulting in potential hazards that only reveals itself on upgrade.

To Reproduce

Deploy and instantiate a Zoe contract that await an external call in its start function, then attempt to upgrade that contract.

Expected behavior

The contract instantiation should fail in the first place. None of its effects (message sends) should be performed (crank rollback).

Description of the Design

The cleanest and preferred solution for now is to abandon the ability to make zcf zygotes, and for Zoe startInstance to pass the needed contract bundle and its creation params to vatParameters so that zcf can start the contract within its own buildRootObject even on first start. The necessary work for this was already done for contract upgrades (#4381)

Alternatives that were considered:

This might be considered within the context of improved upgrade protocol for vats in general (#4382)

Security Considerations

If we were adding a new liveslots power, question of limiting who has access to it.

Scaling Considerations

This would effectively remove the temporal point where we could create a zcf zygote, but that has never been implemented and the perf benefits were never measured.

Test Plan

An upgrade test based on the reproduction steps above.

Upgrade Considerations

This would require a new Zoe and ZCF bundle, and thus either a core eval or a chain software upgrade.

While this removes a footgun for contracts, any contract that currently (ab)used this possibility would fail to start after these changes are deployed. It would however not affect upgrades of these contracts as single crank start was already enforced in those cases.

Chris-Hibbert commented 3 weeks ago

A contract that fell prey to this footgun would not be able to do a null-upgrade, but full upgrades would be possible with a re-write that removed the problematic code. For the near term, we expect to enforce by process that all upgrades are tested in a3p or a testNet before getting pushed to mainNet.

The footgun is serious, but I don't think the fix is needed urgently.

mhofman commented 3 weeks ago

FWIW, this was motivated by https://github.com/Agoric/agoric-sdk/pull/10140#discussion_r1779349098, which may require a completely different design. While it's always possible to transition in upgrades, it's now always clean given you have to maintain the kinds that were previously defined, and potentially the implementations of these kinds if the objects have been shared.

mhofman commented 3 weeks ago

It looks like we have an agreement on dropping potential support as currently enabled. I updated the issue description to reflect that preferred approach.