I've seen a few cases of missed rows that I believe can be tied to resuming, but please correct me if I'm wrong:
The binlog processing algorithm looks something like this with state tracking:
Streamer is a mysql replication client that retrieves events
Pass off events by default to defaultEventHandler
defaultEventHandler calls handleRowsEvent
handleRowsEvent filters + creates batches of events
handleRowsEvent hands off to event listeners (zoom into this later on)
Record binlog pos in state
So the default event listener is BinlogWriter.BufferBinlogEvents, which pushes events onto a channel, while a separate thread pulls off the channel and processes it.
Here's the problem: step 5 for an applicable event is only blocked by pushing onto the channel. Actual event processing happens in another non-blocking thread from that one (unless it backs up so much the channel gets full), but in practice this means it is somewhat common that when write load is high, and a ferry run is interrupted, the data in the event channel is lost.
I've seen a few cases of missed rows that I believe can be tied to resuming, but please correct me if I'm wrong:
The binlog processing algorithm looks something like this with state tracking:
So the default event listener is
BinlogWriter.BufferBinlogEvents
, which pushes events onto a channel, while a separate thread pulls off the channel and processes it.Here's the problem: step 5 for an applicable event is only blocked by pushing onto the channel. Actual event processing happens in another non-blocking thread from that one (unless it backs up so much the channel gets full), but in practice this means it is somewhat common that when write load is high, and a ferry run is interrupted, the data in the event channel is lost.