cosmos / interchain-security

Interchain Security is an open sourced IBC application which allows cosmos blockchains to lease their proof-of-stake security to one another.
https://cosmos.github.io/interchain-security/
Other
154 stars 119 forks source link

Channel initialization should have a timeout #278

Closed mpoke closed 1 year ago

mpoke commented 2 years ago

Once a gov proposal to add a new consumer chain passes on the provider, a new client to the consumer chain is created. From this point on, all the unbonding operations (e.g., undelegations) cannot complete until the CCV channel is established, which requires both the connection and the channel opening handshakes to complete. Since there are no timeouts on these handshakes, the channel initialization could theoretically take forever. In practice, this can happen if there is no relayer to relay the connection and channel opening handshake messages (e.g., ChanOpenInit, ChanOpenTry).

To guarantee that all unbonding operation eventually complete, channel initialization should be time-bounded (via a timeout). For example, once the client is created on the provider, the CCV channel needs to be established within 1 week.

josef-widder commented 2 years ago

I guess there are some details to be clarified, E.g., does the consumer chain need to shut down in case the channel is not open? Can it be the case that the channel is open on the consumer chain, but times out on the provider chain? All this can be solved, we just need to think about them.

In general, I don't think there is a way around adding some synchrony assumptions as you proposed here.

mpoke commented 2 years ago

does the consumer chain need to shut down in case the channel is not open?

That's a good point. We can let it happen outside the protocol, i.e., the validators will eventually stop running the consumer. Anyway the consumer chain has all user TXs disabled until the channel is opened, so there is not much utility running it. However, I agree that we need a mechanism of notifying the validators to stop running the consumer binary.

Can it be the case that the channel is open on the consumer chain, but times out on the provider chain?

No. The channel first becomes established (aka opened) on the provider during OpenChanConfirm and then on the consumer when receiving first VSC packet. If the timeout is reach on the provider, there is no VSC packet sent.

jtremback commented 2 years ago

The only way I can see this happening IRL is if there is some coding issue with the consumer chain which causes it not to be able to complete the handshake. I would put this in the same bucket of arbitrarily bad behavior as a consumer chain that simply never sends VSCMaturedPackets, and it is just something that governance will need to catch in consumer chain code review.

Should this be left open? I will move it out of v1 release in any case.

mpoke commented 2 years ago

Same argument as here https://github.com/cosmos/interchain-security/issues/283#issuecomment-1230930067. If the consumer goes down during channel initialization, all the unbondings started on the consumer during this time will not finish. The only way to deal with it is to pass a proposal to remove the non-responsive consumer chain. Seems much easier to have it implemented in the protocol.