Closed azylman closed 10 years ago
Interesting, I don't think that's something we'd considered. One thing to note is that channels guarantee that items come off the channel in the same order they were put on, so if addToSet("a")
happened first in the oplog, it will come off the channel first. HOWEVER, that doesn't provide guarantees that it will make it to the database first - just likelihoods.
So we probably want to revert that part of this PR and just stick to the streaming.
Summarizing a conversation that happened in hipchat.
I think choosing to use concurrency here or not is a question of the goals of oplog-replay. Whether we want more accuracy of the load simulation at the expensive of the accuracy of the data being entered. The concurrency allows that load to be kept more accurate and not fall behind. If we later want to use this for replaying production data accurately, there are other ways we can do that (for example, sending the operations as a list which are guaranteed to execute in order).
Let's close out this conversation so we can get this merged in - I think there's good arguments here to make it concurrent, and also good arguments here to make it not concurrent.
Since there hasn't been any activity on this, my proposal is to leave it concurrent for now - it's already written that way, and that's more in line with the initial goals of this project.
If we later need to revisit that assumption for any reason, we can do so then.
I dunno...the current requirement is replaying production data accurately, not load testing, so I'd err on the side of accuracy.
fb1ad3b7a9ee35a5144fa94f6c016992609674cc
lgtm.
Doesn't order matter? E.g., if op 1 is addToSet("a") and op 2 is pull("a"), then I wouldn't want to apply them concurrently.