Batch oplog requests - Githubissues

Clever / ARCHIVED-oplog-replay

Replay mongodb oplogs at variable speed

Apache License 2.0

3 stars 1 forks source link

Batch oplog requests #7

Closed kvigen closed 10 years ago

kvigen commented 10 years ago

Before this change when replaying the oplog we maxed out at ~150 requests per second. We think the bottleneck was the round trip time between the database and the oplog-replay script. To address that issue we're supporting batching oplog requests.

Conceptually we still have one goroutine responsible for putting the ops in the "channel" to be processed, but now the goroutine that actual calls mongo to apply to operation applies everything currently in the queue at once.

It should be noted that oplog requests will only be batched if the replay is bottlenecked on round trip time to the db. Otherwise the requests will be sent as soon as they should.

kvigen commented 10 years ago

@jefff assigned to you

azylman commented 10 years ago

Is there some benefit to batching this way vs accumulating a set # of operations? This seems more complex.

kvigen commented 10 years ago

I did it because I thought it would simulate our test workloads a bit better. An admittedly contrived example is that you the oplog replay has 20 requests / second and the batch size is 20. Accumulating all of them at once means that you replay all 20 operations once a second rather than 1 operation every 50ms. In other words, you only batch when you're bottlenecked on the round trip time.

Not sure if that's an important goal.

azylman commented 10 years ago

That makes a lot of sense. Can we clarify that it works that way in the comments? Specifically, this was the part I didn't realize: "you only batch when you're bottlenecked on the round trip time."

azylman commented 10 years ago

(now I'll step out and leave this to @jefff to review)

kvigen commented 10 years ago

Yes, good call. I'll add some more detail to the comments.

jefff commented 10 years ago

lgtm.