Open mhofman opened 2 months ago
A contract that fell prey to this footgun would not be able to do a null-upgrade, but full upgrades would be possible with a re-write that removed the problematic code. For the near term, we expect to enforce by process that all upgrades are tested in a3p or a testNet before getting pushed to mainNet.
The footgun is serious, but I don't think the fix is needed urgently.
FWIW, this was motivated by https://github.com/Agoric/agoric-sdk/pull/10140#discussion_r1779349098, which may require a completely different design. While it's always possible to transition in upgrades, it's now always clean given you have to maintain the kinds that were previously defined, and potentially the implementations of these kinds if the objects have been shared.
It looks like we have an agreement on dropping potential support as currently enabled. I updated the issue description to reflect that preferred approach.
Describe the bug
When implementing vat upgrades through baggage and durable kinds (#4325 and #3062), liveslots required that if
buildRootObject()
returned a promise, that it settled immediately (before end of crank). This was done to ensure all behavior was defined, or redefined in an upgrade without waiting on external call results. Aka a "one shot" upgrade so that pending messages to the vat didn't need to be embargoed.When ZCF was first designed, this mechanism didn't exist, and concerns about cost of creating vats lead towards a Zygote design (#2268). Bundle caps also didn't exist or were not allowed in vatParameters (#4381)
This resulted in a design where the ZCF bundle is used to create an "empty shell" of a contract vat, and in a second delivery, the bundle of the contract is sent to that shell and the contract is started. However in an upgrade, the contract start is called as part of ZCF's
buildRootObject()
, which awaits the contract start.That means a contract start is not required to complete within its crank on the first incarnation, but is required to in future incarnation, resulting in potential hazards that only reveals itself on upgrade.
To Reproduce
Deploy and instantiate a Zoe contract that
await
an external call in itsstart
function, then attempt to upgrade that contract.Expected behavior
The contract instantiation should fail in the first place. None of its effects (message sends) should be performed (crank rollback).
Description of the Design
The cleanest and preferred solution for now is to abandon the ability to make zcf zygotes, and for Zoe
startInstance
to pass the needed contract bundle and its creation params tovatParameters
so that zcf can start the contract within its ownbuildRootObject
even on first start. The necessary work for this was already done for contract upgrades (#4381)Alternatives that were considered:
This might be considered within the context of improved upgrade protocol for vats in general (#4382)
Security Considerations
If we were adding a new liveslots power, question of limiting who has access to it.
Scaling Considerations
This would effectively remove the temporal point where we could create a zcf zygote, but that has never been implemented and the perf benefits were never measured.
Test Plan
An upgrade test based on the reproduction steps above.
Upgrade Considerations
This would require a new Zoe and ZCF bundle, and thus either a core eval or a chain software upgrade.
While this removes a footgun for contracts, any contract that currently (ab)used this possibility would fail to start after these changes are deployed. It would however not affect upgrades of these contracts as single crank start was already enforced in those cases.