Snapchat / KeyDB

A Multithreaded Fork of Redis
https://keydb.dev
BSD 3-Clause "New" or "Revised" License
11.02k stars 564 forks source link

[BUG] Overlapping replication writes during full sync #808

Open keithchew opened 3 months ago

keithchew commented 3 months ago

Testing on v6.3.4, when a slave starts up, it performs a full sync with the master. After receiving the RDB file from the master, it attempts to load the dataset (with count logging):

Slave:

7:56:S 27 Mar 2024 09:19:32.288 * MASTER <-> REPLICA sync: Finished with success
7:56:S 27 Mar 2024 09:19:33.794 # DB 0: keys [5009608] volatile [419883]

Master:

7:71:M 27 Mar 2024 09:15:47.685 * Background saving terminated with success
7:71:M 27 Mar 2024 09:15:58.772 * Synchronization with replica 192.168.6.96:6396 succeeded
7:71:M 27 Mar 2024 09:16:00.733 # DB 0: keys [5009608] volatile [419883]

You can observe counts (btw keys count is what is stored in Flash) on both sides match up, all good. However, if there is a client writing to the master during the sync, the count will get out of sync DURING the rdb loading stage!:

Slave:

7:56:S 27 Mar 2024 09:33:24.998 * MASTER <-> REPLICA sync: Finished with success
7:56:S 27 Mar 2024 09:33:25.902 # DB 0: keys [5010314] volatile [419883]

Master:

7:71:M 27 Mar 2024 09:29:45.628 * Background saving terminated with success
7:71:M 27 Mar 2024 09:29:56.683 * Synchronization with replica 192.168.6.96:6396 succeeded
7:71:M 27 Mar 2024 09:29:59.680 # DB 0: keys [5009608] volatile [419883]

Logs show that client has count of 5010314 instead of the expected 5009608, which is very odd. I would have expected the count to match up, then after RDB loading, the slave then continues to process incoming replication writes. I will try to trace this through, but if anyone has any suggestions on where to look, that'll be great!

keithchew commented 3 months ago

I have tracked it down. It looks like this snippet in rdb.cpp is causing KeyDB to process both RDB records as well as in flight replication commands from master at the same time, in particular the call to processChangesAsync(). Is there a way to only process RDB records first, and then followed by replication commands after?

            if (fHighMemory || f1024thKey)
            {
                for (int idb = 0; idb < cserver.dbnum; ++idb)
                {
                    if (g_pserver->m_pstorageFactory) {
                        g_pserver->db[idb]->processChangesAsync(this->cstorageWritesInFlight);
                        fHighMemory = false;
                    }
                }
                if (fHighMemory)
                    performEvictions(false /* fPreSnapshot*/);
            }
keithchew commented 3 months ago

I have traced through this a bit more, and found the root cause of the count mismatch. I am testing on the async_flash branch and it looks like that is causing the this count mismatch error. As a temporary workaround, I have disabled the async flash prefetch when KeyDB is in loading mode. Apologies for initially assuming the issue was in the main branch. My local workaround in db.cpp:

...
                if (spexpire != nullptr) {
                    if (spexpire->when() < mstime()) {
                        goto LUnneeded;
                    }
                }
// -------------------------------------
// ADDED snippet here
// -------------------------------------
                if (g_pserver->loading) {
                    goto LUnneeded;
                }
// -------------------------------------
                dictAdd(db->m_pdict, key, o);
...