cosmos / ibc

Interchain Standards (ICS) for the Cosmos network & interchain ecosystem.
Other
938 stars 383 forks source link

ICS2: Block Delay Decision #573

Open AdityaSripal opened 3 years ago

AdityaSripal commented 3 years ago

The block delay was introduced as an additional parameter to improve connection security. At least n number of blocks must pass since a consensus state was received before it can be used in packet processing. The time delay eases the race condition in terms of time between misbehaviour submission on that consensus and a packet processing message. The block delay eases the race condition in terms of block space (mempool ordering) between misbehaviour submission of a consensus state and packet processing message.

The IBC implementation has decided not to go with a separate block delay parameter encoded in the connection as the spec suggests. The reason for this is because changing the connection struct will break connection handshakes (a chain on the new version cannot connect to a chain on the old version).

It is critical for us to maintain ability of a stargate chain to connect to a chain on latest IBC for the foreseeable future.

Some non-breaking proposals were discussed:

  1. Simply enforce that a single block has passed.
    • If the executing chain has slowed down for a really long time. The mempool may take multiple blocks to clear. There's no guarantee that misbehaviour submission will be processed in first block. Thus we want the ability to specify a block delay higher than 1.
  2. Hardcoding a block delay that will be enforced for each connection
    • This is not sensitive to varying time delay periods. A larger time delay should have a larger block delay, since in the case where executing chain has not produced blocks even over a long time delay, it may take many blocks to clear the mempool and ensure that a misbehaviour submission submitted within that time gets included in a block before packet processing message. Thus, we want a block delay that is proportional to time delay.

Solution: Include a chain-wide parameter called MaxTimePerBlock, which represents the maximum expected time the chain expects it will take to produce a block.

Using this parameter, we can calculate the minimum number of blocks that we expect to pass within a given time delay like so:

blockDelay := roundup(timeDelay / MaxTimePerBlock)

We round up for safety and to make sure that non-zero floats less than 1 still wait at least one block. This also ensures the block delay is 0 for connections that have a 0 time delay, since these connections prioritize latency over safety.

This allows the chain to specify its expectation on its own block production, and create proportional block delays for all time-delay enabled connections. Critically, since this is a parameter internal to a chain's state machine; it does not break any of the external facing interfaces IBC uses to talk to other chains.

AdityaSripal commented 3 years ago

cc: @cwgoes @colin-axner @brapse

cwgoes commented 3 years ago

A larger time delay should have a larger block delay, since in the case where executing chain has not produced blocks even over a long time delay, it may take many blocks to clear the mempool and ensure that a misbehaviour submission submitted within that time gets included in a block before packet processing message. Thus, we want a block delay that is proportional to time delay.

Yes, I agree with this.

Solution: Include a chain-wide parameter called MaxTimePerBlock, which represents the maximum expected time the chain expects it will take to produce a block.

Using this parameter, we can calculate the minimum number of blocks that we expect to pass within a given time delay like so:

blockDelay := roundup(timeDelay / MaxTimePerBlock)

Why the maximum time per block, instead of just the average expected? Using the maximum time per block will result in requiring a number of blocks to have passed less than we expect (on average) to be committed within the time period - maybe this is what we want, but it's not clear to me based on this reasoning along why we want this.

Does that question make sense?

Based on this

A larger time delay should have a larger block delay, since in the case where executing chain has not produced blocks even over a long time delay, it may take many blocks to clear the mempool and ensure that a misbehaviour submission submitted within that time gets included in a block before packet processing message.

I'm almost wondering whether the "more accurate" solution is to wait until a particular rate of block production has been reached with respect to the packet submission time so that we can reasonably expect the mempool to have been cleared. I think this is probably far too complex to actually implement though - but it's awkward that we're importing this sense of how the mempool works without clearly specifying it.