filecoin-project / go-f3

Golang implementation of Fast Finality in Filecoin (F3)
Apache License 2.0
9 stars 6 forks source link

The F3 <-> EC fork issue #718

Open Stebalien opened 6 hours ago

Stebalien commented 6 hours ago

There's a theoretical issue where, if F3 takes a long time to finalize a tipset, it might cause long-range forks in EC. We're alleviating this by:

  1. Never trying to finalize the current head (#716).
  2. Preventing clients (e.g., lotus) from accepting finality certificates that would revert beyond EC finality (https://github.com/filecoin-project/go-f3/issues/717).

However, the issue still persists. The core of the issue is that:

  1. F3 gets a chain from EC.
  2. F3 can spend an arbitrary amount of time trying to finalize it.
  3. In the meantime, EC fork away from the head F3 ends up deciding on.

To fix this, we likely need some way for F3 to discard the current proposal (if too old) and get a new one from the client. However, this is tricky to implement in the current GPBFT protocol without breaking the liveness guarantees.

There are really two parts to this issue:

  1. Reducing the likelihood of long-range (10+ epochs) forks (to avoid breaking client assumptions).
  2. Preventing forks beyond EC finality.

However, the catch is that nobody can emit two decide messages for the same instance without potentially breaking GPBFT. But there are also certain decisions that are simply unacceptable.

Stebalien commented 6 hours ago

The actual solution may have multiple phases:

  1. In phase one, we operate normally and try to finalize any valid chain.
  2. In phase two, we try to avoid phase 1 by somehow skewing towards the heavier chain?
  3. In phase three, we go back to trying to finalize any valid chain. We're accepting the fact that we're likely going to have a long-range fork

But it looks like any solution will have to involve feedback between GPBFT and EC:

  1. GPBFT needs to know when it's taking too long. In that case, we want to decide on base ASAP so we can get a new proposal.
  2. EC should maybe consider switching chains based on GPBFT. E.g., if we see a quorum of quality messages for some prefix, we may want to eagerly switch to that chain because it'll likely be finalized.
jennijuju commented 5 hours ago

EC should maybe consider switching chains based on GPBFT.

If I understand correctly, the issue is that f3 participants needs to be notified if there the longest EC chain blocks is different with whaat they are finalizing over with today - so shouldn't this be the other way around -> F3 should maybe consider switch chains base on EC?

jennijuju commented 5 hours ago

What will happen today if the chain receive a finalized set of blocks that doesn't matches EC longest chain blocks?

Stebalien commented 5 hours ago

If I understand correctly, the issue is that f3 participants needs to be notified if there the longest EC chain blocks is different with whaat they are finalizing over with today - so shouldn't this be the other way around -> F3 should maybe consider switch chains base on EC?

Both.

  1. GPBFT should switch if it's taking too long and trying to finalize something EC isn't building on.
  2. EC should try to build on what GPBFT is likely to finalize to reduce the chances of (1) being an issue (and to increase the chances of building on the right chain).
Stebalien commented 5 hours ago

What will happen today if the chain receive a finalized set of blocks that doesn't matches EC longest chain blocks?

We switch to the F3 finalized chain no matter what.