Expensify / Bedrock

Rock solid distributed database specializing in active/active automatic failover and WAN replication
https://bedrockdb.com
GNU Lesser General Public License v3.0
1.08k stars 82 forks source link

Use correct cancelAfter broadcast when resources are exhausted #1767

Closed danieldoglas closed 2 months ago

danieldoglas commented 2 months ago

Details

Currently, we're facing issues with the replication timing. Because that's happening, the number of replication threads is growing too much, causing resource exhaustion.

Considering that all replication messages can arrive in parallel and not necessarily in order, followers could get stuck in the following case:

Since no threads were created for commit 2, it is never applied. Thread 2, which depends on commit 2 to be completed, gets stuck in an infinite loop. The server then never goes back to a searching state, and can only go back to the pool after a restart.

Fixed Issues

Fixes GH_LINK

Tests


Internal Testing Reminder: when changing bedrock, please compile auth against your new changes