unit-e consumes too much memory on sync

Ruteri commented 5 years ago

When syncing node from the testnet the client quickly eats up all of the RAM and is killed by the OOM killer.

To reproduce it is enough to try to sync the node. It is especially easy to reproduce when having good connection with the node used for syncing.

I've identified that the issue is finalization state which is stored for every header+commits message received during header+commits sync. The state repository is not pruned fast enough as the validation lags far behind the sync and the effect is that the node consumes enormous amounts of memory (30GBs during first 40k blocks).

@frolosofsky further identified two issues:

The commits are synced too far ahead
Individual finalization states consume too much memory

Limiting the allowed offset between tip and synced headers should be enough to ensure proper resources usage during sync.

The state repository data could use some optimization too, as most of the states don't differ in any way (only differences arise when checkpoints are reached or there are commits on the chain - usually once or twice per epoch). In the attached Massif's output one can see that individual finalization state takes up to 4MBs of memory (at around 40k height).

Ruteri commented 5 years ago

massif.out.txt.zip

thothd commented 5 years ago

Good catch, in general and by syncing few times, I'd strongly consider limiting the commits based sync to the fast sync use case only. There it's essential to iterate over the commits only, while in full sync we anyway need to download full blocks for validation. In full sync, the node anyway needs to download the full blocks. Commits should still, of course, be validated epoch by epoch thus allowing but by taking them from the block and eliminating duplicate download of the full commits history

dtr-org / unit-e

unit-e consumes too much memory on sync #1073