cosmos / interchain-security

Interchain Security is an open sourced IBC application which allows cosmos blockchains to lease their proof-of-stake security to one another.
https://cosmos.github.io/interchain-security/
Other
154 stars 116 forks source link

Implement sovereign -> consumer chain changeover #288

Closed jtremback closed 1 year ago

jtremback commented 2 years ago

We know this should be theoretically possible, but we have never actually tested (to my knowledge) the transition of a sovereign chain to a consumer chain.

Conditions of satisfaction:

There are a few different ways this could be done, we should use this thread to plan them out.

jtremback commented 2 years ago

AFAIK, the important thing for IBC clients to keep their connection is for the next_validators_hash in the last update from the sovereign validator set to be the hash of the new consumer chain's validator set, the provider validator set. There was talk about no more than a 1/3 change being allowed, but this is probably not an issue if relayers are configured correctly (need someone to weigh in here, have gotten lots of conflicting info).

Here is @zmanian's idea:

  1. We need to have a governance proposal type that basically overwrite the entire staking module with validators from the cosmos hub and then freeze the valset
  2. When this happens the chain it signed a next valset which is the hub valset.
  3. The chain then halts until the hub valset starts up nodes on the chain.
  4. Then you upgrade the state machine to support ICS after the hub valset has started their nodes

As stated in point 2, replacing the validator set results in a light client state being created where the old validator set signs a next_validators_hash of the new validator set. This provides continuity and should allow the IBC connection to be maintained. I'm not sure if you need the upgrade to ICS to be a different step, or if the hub valset can just start up a chain with the ICS logic already in it. Based on my limited understanding of IBC, this sounds like it should work. It will require a new governance proposal type that can replace the entire valset, but this does not need to be part of ICS itself.

Here's an alternate technique which should work with no code outside of CCV:

  1. Export the genesis of the sovereign chain
  2. Paste the ccv consumer section into the genesis file, with the InitialValSet section set to the validator set of the sovereign chain, not the provider chain.
  3. Normally, this will fail in the consumer's GenesisState.Validate() function since it checks that the NextValidatorsHash is equal to the InitialValSet. This check will need to be disabled, either conditionally for this special sort of startup case, or in general (are we sure it's really necessary)?
  4. The sovereign validator set starts the chain back up, and lets it go through the CCV handshake. When the first VSC packet comes in, the standard CCV logic changes the validator set over to the provider validator set. This accomplishes the same thing that happens in Zaki's step 2 above.

The benefit of this technique is that it requires very little extra code to be written. The only thing that needs to happen is for us to disable the check here @mpoke is this safe to do?

mpoke commented 2 years ago

There was talk about no more than a 1/3 change being allowed, but this is probably not an issue if relayers are configured correctly (need someone to weigh in here, have gotten lots of conflicting info).

That is correct. The "no more than a 1/3 change" is a requirement for the bisection protocol of light clients.

This was what @josef-widder said about this:

you can have arbitrary changes of validator sets in sequential. The verification logic is defined in [LCV-FUNC-VALID.2]. In case of sequential verification (search “immediate successor”), one checks trusted.Header.NextValidators against untrusted.Header.Validators. However, the change in validators would be visible between trusted.Header.NextValidators and trusted.Header.Validators, which is not constrained at all. This is because sequential is treated as a special case here.

The overlap is only for skipping verification and it is defined in “Returns SUCCESS” below.

mpoke commented 2 years ago

The sovereign validator set starts the chain back up, and lets it go through the CCV handshake. When the first VSC packet comes in, the standard CCV logic changes the validator set over to the provider validator set.

Some concerns / comments I have re. this design:

mpoke commented 2 years ago

Normally, this will fail in the consumer's GenesisState.Validate() function since it checks that the NextValidatorsHash is equal to the InitialValSet.

The only thing that needs to happen is for us to disable the check here @mpoke is this safe to do?

This code doesn't check that the NextValidatorsHash of the previous block equals the InitialValSet, but rather that the NextValidatorsHash of the provider consensus state (passes via the genesis file) quals the InitialValSet. The ProviderConsensusState is used to create a client to the provider (see here).

This check is actually not needed. There is no requirement for the local validator set to match the NextValidatorsHash in the consensus state of a remote chain.

mpoke commented 2 years ago

Note that UpdateClient doesn't work across different revision numbers (see https://github.com/cosmos/ibc-go/blob/a83bcd5af71f3121e97141f797e4970419925992/modules/light-clients/07-tendermint/update.go#L61). I'd assume that the sovereign chain will increment its revision number once it becomes a consumer chain.

jtremback commented 2 years ago

After talking with @mpoke today, I think that my suggested technique has too many problems. The handshake will not be able to work because the provider chain expects the consumer chain IBC client to have the provider chain validator set, not the sovereign validator set. We should probably go with Zaki's idea, which I will flesh out more here:

  1. A "validator switch" governance proposal is made on the sovereign chain. This contains: a. The block or time after which the switch will take place b. The IBC light client whose validator set will be switched to (this will be the provider chain)
  2. At or after the switch time, anyone can make a transaction supplying the validator set corresponding to the valset hash in the designated light client. This transaction won't have any effect unless the IBC light client of the provider has been updated recently.
  3. This validator set is verified by comparing it to the hash, and if valid, is sent to Tendermint, replacing the entire validator set.
  4. When this happens, an IBC light client update is produced, effectively signing over control of the chain to the new validator set. At this point the chain halts, since the existing validator set no longer has the ability to produce new blocks.
  5. The new validator set (in our case, the Cosmos Hub validator set) starts running the chain, producing new blocks.
  6. The new validator set does a chain upgrade adding and enabling the CCV consumer module (this might also be possible to do as part of step 5)
mpoke commented 2 years ago

I agree that this approach is the way to go, but there are some things that need to be sorted out.

Problem statement:

Note: We distinguish between a sovereign chain wanting to become a consumer chain and a chain that wants to start from the beginning as a consumer chain. Here we focus only on the former.

Requirements:

Approach:

Note: The first two steps can happen also after the gov proposal passes.

As a side effect, the entire Channel Initialization protocol is no longer needed. In other words, once chainA starts running as a consumer chain, it already has an established CCV channel.

jtremback commented 2 years ago

Implementation plan for Marius's proposal above:

danwt commented 2 years ago

Is there any worry about needing to trust what is being received over ibc from the sovereign chain? (Because it will have its own valset at that time)

mpoke commented 2 years ago

Is there any worry about needing to trust what is being received over ibc from the sovereign chain? (Because it will have its own valset at that time)

Nothing is received over IBC from the sovereign chain. The IBC channel is used to send the provider valset (aka the consumer initial valset) to the sovereign chain so that the control of the sovereign chain can be passed to this valset.

jackzampolin commented 2 years ago

How do the client updates work for other chains connected to the sovereign -> consumer chain?

mpoke commented 2 years ago

How do the client updates work for other chains connected to the sovereign -> consumer chain?

@jackzampolin This should work directly by relaying a ClientUpdate message, i.e., https://github.com/cosmos/ibc-go/blob/b601462fbce9dd34620cd9129ff9b9057ee5c186/modules/core/keeper/msg_server.go#L62. Since the last block of the sovereign has NextValidatorsHash pointing to the valset validating the first block of the consumer, this transition is the same as the valset on a chain changing completely between two subsequent blocks.

jtremback commented 2 years ago

@mpoke is this in the spec yet?

mpoke commented 1 year ago

@jtremback Yes. There is a PR open on the spec repo https://github.com/cosmos/ibc/pull/840

asalzmann commented 1 year ago

We've started to work on this cc @jstr1121 @jtremback

mpoke commented 1 year ago

We've started to work on this cc @jstr1121 @jtremback

@asalzmann please check the latest version of the spec, i.e., https://github.com/cosmos/ibc/pull/840

mpoke commented 1 year ago

We've started to work on this cc @jstr1121 @jtremback

@asalzmann please check the latest version of the spec, i.e., cosmos/ibc#840

Especially the comment in the BeginBlockInit method, i.e.,

  // pre-CCV state is over; upgrade chain to consumer chain
  //  - set preCCV to false
  //  - the existing staking module no longer provides 
  //    validator updates to the underlying consensus engine
  //  - the CCV module starts providing validator updates 
  //    to the underlying consensus engine
  //  - for safety, the existing staking module must be kept 
  //    for at least the unbonding period
shaspitz commented 1 year ago

@asalzmann @jstr1121 lmk if there's anyway I can help guide any efforts! Happy to hop on a call

mpoke commented 1 year ago

Implementation: https://github.com/Stride-Labs/interchain-security/pull/1

shaspitz commented 1 year ago

Superseded by https://github.com/cosmos/interchain-security/issues/756