Shopify / ghostferry

The swiss army knife of live data migrations
https://shopify.github.io/ghostferry
MIT License
759 stars 72 forks source link

Ghostferry binlog streamer lag with large source write volume due to misconfigured BinlogEventBatchSize? #332

Open shuhaowu opened 2 years ago

shuhaowu commented 2 years ago

Whne the source experiences high load, it can sometimes fail to keep up and lag. This is unusable, as binlog streamer lag must be zero during cutover.

Thinking carefully about how Ghostferry works shows us why lag may occur. Ghostferry streams the binlog via the BinlogStreamer component. Once a binlog is read, it is put into a channel with a size of BinlogEventBatchSize (set in the config). This channel is read from the BinlogWriter, a separate go routine, with the following pseudo code:

  1. The event is read from the channel and appended into a buffer.
  2. If there are no more events, or if the buffer exceeds BinlogEventBatchSize, then write to the target.

We can note that during step 2 (during writing to MySQL), the BinlogWriter is not emptying from the channel, but the BinlogStreamer continues to fill the channel. If it takes a long time to write to MySQL, then the channel may be full. This means if the channel is full, the BinlogStreamer will be blocked, as the channel effectively serves as a buffer. In a recent move where we had lag, I took a profile via pprof for 5 seconds and graphed the Synchronization blocking profile, which showed that for 4.84s, the BinlogStreamer is blocked.

image

Math

There is a possible solution to this: if the buffer is larger, then the write (step 1) may finish before the buffer is full. This means the BinlogStreamer will never block. We can analyze this condition mathematically:

  1. Let's say the write time taken is denoted as T.
  2. The BinlogEventBatchSize is denoted as B.
  3. The rate at which binlog event is generated on the source is R.
  4. We can calculate the "optimal" batch size to be B = T * R (or B = TR)
  5. We know the time it takes to write T is a function of the B, as larger batch sizes will take longer to write.
  6. We can assume a simple linear model, where T = mB + o. We can interpret the parameter m to be the (marginal) time needed for writing a single binlog event on the target, and o to be the overhead of a single write operation (think network round trip and other overhead).

Combining the equations from 6 and 4:

B = TR
T = mB + o

B = (mB + o)R
B = mBR + oR
B - mRB = oR
(1 - mR)B = oR
B = oR / (1 - mR)

So the optimal batch size is B = oR / (1 - mR). Unfortunately, we don't have the values to any of these terms. Perhaps in the future Ghostferry can measure these.

However, if MySQL is too slow, Ghostferry cannot keep up at all. This is also shown in the equations. Specifically, if mR >= 1, then B will be a negative number. This is non-physical and thus doesn't work. This "magical expression" is more clear if I give an example. Recall m is the time required to write one event. Suppose m = 0.01 second / event. Recall R is the number of events / second on the source and suppose it is R = 200 events / second. This obviously doesn't work, because the target can only write a maximum of 100 events / second. The generation rate in this example is 200 events / second. mR = 2, which means the optimal batch size doesn't exist.

Recommendations

owensmallwood commented 12 months ago

How many writes/second is considered "high load" on the source database for Ghostferry to handle?