Snapchat / KeyDB

A Multithreaded Fork of Redis
https://keydb.dev
BSD 3-Clause "New" or "Revised" License
11.34k stars 572 forks source link

[BUG] race condition on startup between file loading and replication #811

Open keithchew opened 6 months ago

keithchew commented 6 months ago

Testing on v6.3.4. This scenario will only happen if you have:

On startup, there are multiple places where replicationCacheMasterUsingMyself() is called:

To prevent the race condition, we should check if the data is loaded from disk (ie there is already check there if flash folder is empty), and only if not loaded from disk, it runs the snippet from InitServer() which restores the replication info from DB.

Without the fix, you will notice this log will appear multiple times (ignore first one as that comes from config loading at the start):

Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.

And you can verify if the race condition happens by adding this log in replication.cpp:

    bool validState = mi->master != NULL && mi->cached_master == NULL;
    serverLog(LL_NOTICE, "MASTER <-> REPLICA sync: Finished with success, validState[%d]", validState);

validState will be 0 if the race condition happens.

Note that the slave will continue to run even with the race condition above. But it will crash the next time the server reboots, as the state check above is done when master disconnects, ie after this log:

"Connection with master lost."