Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
327 stars 207 forks source link

Chain recovery startup allows economic recovery first #4317

Open rowgraus opened 2 years ago

rowgraus commented 2 years ago

What this means to me is:

The fix we talked about was to do a "soft restart", in which the chain is told that it is restarting, and for the first minute or so, it does not accept any messages other than economy-critical price-oracle signals.

We'd implement this with the #5334 backpressure mechanism which controls ingress at the mempool/txn level to exclude non-oracle-signed transactions from blocks during the restart window, plus some code in the new version that knows when this window starts and ends. If the chain halted just after block 100, such that the next block executed will be 101, then our replacement/upgraded software should have something in cosmic-swingset that does:

if (blockHeight === 101) {
  disableNonEconomicTxs();
} else if (blockHeight === 111) {
  enableNonEconomicTxs();
}

to give roughly 60 seconds for the economic engine to get prepared for user requests. We'd also need to ensure that the oracle price signals / etc can be delivered during that window, even if user requests are flooding the RPC servers/etc.

We might consider making this more explicit: let the vats that manage vaults give a signal when they believe they're up to date, and disable non-economic messages until that point. That might mean control over the non-economic admissibility should be made available to userspace, which would be.. exciting. It would also want a way for the cosmic-swingset layer to signal to those economy vats that we'd entered soft-start mode, and that the vats are responsible for exiting it when they're ready.

if (blockHeight === 101) {
  disableNonEconomicTxs();
  controller.queueToVat(economy, 'economyPaused');
  // economy vats will re-enable the non-economic txs after getting a price update
}

@rowgraus points out that this feature could easily consume more effort than it warrants, and/or could expose more of an attack surface than it addresses, and I agree. I think we'll need to invoke our economist friends for advice too.

Tartuffo commented 2 years ago

@dtribble Is this needed for Mainnet-1?

warner commented 2 years ago

@dtribble is this a MN-1 thing?

Tartuffo commented 2 years ago

@warner For proper project planning and tracking, this needs an area label covered by one of our weekly planning meetings. Please pick the appropriate one from: agd, agoric-cli, agoric-cosmos, amm, core economy, cosmic-swingset, endo, ertp, getrun, governance, installation-bundling, metering, oracle, pegasus, run-protocol, ses, staking, swingset, swingset-runner, tc39, token economy, tooling, ui, wallet, xsnap, zoe, zoe contract

warner commented 2 years ago

next step: have a meeting to figure out a design

mhofman commented 2 years ago

@rowgraus can we make this a regular issue? I believe I don't have the rights to do that.

Tartuffo commented 2 years ago

@mhofman I was able to convert it to a regular issue.

mhofman commented 2 years ago

We discussed this in the kernel meeting today. The summary of the discussion:

On a start after halt scenario, we'll need the swingset low-priority handling to start disabled.

We'll also likely want any low priority messages to be rejected at the cosmos layer until the kernel is ready to process low priority messages again, arguing for an explicit version of the lever for that mechanism.

An economic contract which relies on price oracles could decide they're ready once they've received a second price update (after it acknowledged the first update which may have been stale, and then oracle sent a now up-to-date quote).

Tartuffo commented 2 years ago

@mhofman Can we remove the in-design label from this one?