Open Stebalien opened 6 days ago
An alternative is to wait an instance. That is, always consider the latest finality certificate as "pending" until one has been built on-top-of it. We can do this safely due to the power table lookback. The network would have to be willing to "switch" decisions while the latest instance is still pending.
This lookback won't be completely transparent to the client, but shouldn't be that hard to implement....
We discussed this in person. The alternative is not a good solution because we don't have a hash link (and even the existence of that additional finality certificate has serious consequences).
We've implemented limited rebroadcasting to help lagging/restarting nodes catch up. However, if the 66%+ of the network crashes after starting an instance but before sending a single decide message, the network could decide on two different values for the same instance.
A simple solution here is write-ahead logging. That is:
Of course, nothing will help if the actual disks die. But this will at least help us recover in case someone finds a way to crash the entire network all at once.
The specific attack I'm worried about is as follows: