AleoNet / snarkOS

A Decentralized Operating System for ZK Applications
http://snarkos.org
Apache License 2.0
4.1k stars 2.57k forks source link

[Fix] Coupling block sync to DAG state #3268

Open mdelle1 opened 1 month ago

mdelle1 commented 1 month ago

Motivation

This PR focuses on coupling block sync to DAG state replication. When a node is syncing via block responses, it will sync its storage and DAG with the certificates contained in the block and attempt to update its ledger. Previously, there were scenarios where a node would commit certificates in its DAG without advancing blocks. Instead, the committal of certificates and advancement of blocks during sync should be coupled. This PR commits certificates in the DAG only when blocks are advanced to in the sync module and creates a channel to the BFT to ensure that the leader certificate of the block being added was recently committed in the BFT.

vicsn commented 1 month ago

@raychu86 is this PR ready to go (assuming all tests pass)?

vicsn commented 1 month ago

I did some light testing w/ a 4 validator devnet with these changes merged into the latest mainnet-staging and every time I restarted a validator or resynced a validator from genesis I encountered various errors:

~Interesting findings. I'm not able to reproduce after 5 tries on an M2 Max. Can you share the machine specs used? I assume a slower machine may induce these conditions.~ Correction, I can reproduce on M2 Max.

apruden2008 commented 1 month ago

Since reported errors have not been addressed, holding off on merge for now, and therefore holding off on adding this prior to code freeze. If we want this, it will have to be after launch.

raychu86 commented 2 weeks ago

@mdelle1 Have you had a chance to take a look at the issues being observed with the change?