Open fulltimemike opened 6 months ago
Flat lines in this chart indicate the issue occurring
network topology:
c6a.8xlarge
sreproduce with some automation to reset the ledger of the same 2 every 30 minutes.
As early as within the first 500 blocks we frequently run into this issue on either or both of the 2 reset validators after reaching tip.
notes:
We have a tentative fix for this issue for validators - https://github.com/AleoHQ/snarkOS/pull/3232. The fix is currently undergoing burn-in testing and internal verification.
š Bug Report
Sometimes after a Client Node is restarted, the following error message will pop up:
The next block (X) is invalid - Failed to speculate on transactions - Failed to post-ratify - Next round Y must be greater than current round Y
. This error causes the client to stop syncing, and restarting the client further does not fix the syncing bug. To allow the client to continue syncing, the client ledger must be modified -- either the ledger must be reset to allow the client to resync from genesis, or a snapshot must be loaded into the client to continue syncing.I'm uncertain whether this bug is directly in snarkOS, or if it is a problem with snarkVM. The specific error is thrown here.
Logs directly before the bug is thrown.
In this example, interestingly, blocks and rounds much further ahead (block: 185,032, round: 412383) seem to be logged and added to the ledger than the block and round identified in the error thrown (block: 111196, round: 252154). I'm not sure why the store is apparently adding previous rounds and blocks when it has already surpassed this point.
Steps to Reproduce
Across multiple canary net client nodes, we have observed behavior where restarting the node causes syncing to fail. This bug is nondeterministic, but we have seen that restarting a client node enough times will cause the error to pop up. It may be necessary for the client to be actively syncing during restarts to cause this bug, but I can't be certain.
Expected Behavior
Restarting a client node should not cause the client to get stuck permanently when syncing.
Your Environment
This environment is running on an EC2 linux machine, running a fork of snarkOS with commits up to https://github.com/AleoNet/snarkOS/commit/6aba25d9193c30c82c9762130499554f5c9fea1a.