Closed saguziel closed 6 years ago
@plypaul @aoen
Can you post some benchmark before and after?
Also, seems to be failing build, possibly due to missing some comments around classes / methods.
Build errors are checkstyle which are WIP. I'll do some higher quality benchmarking but it seems to not change the query time but reduces the number of queries x-fold
If the approach looks generally okay, I will start adding docs and fixing checkstyles
Comments will expedite and aid the review process as we can figure out what classes / methods are supposed to do and also better check assumptions.
Overall approach looks good, but may have missed the clear benefit of using futures here.
Cool, was mainly looking for a review on overall approach.
Used CompletableFutures because I think the abstraction is cleaner that the deferredCreates return a future of their result rather than a builder where it's dependent on the ordering.
The benchmarked numbers refer to non-filtered, non-noop entries (ie entries that create a replication job). The case for noop or filtered entries probably isn't changed much.
Before: 30-40 jobs per second After: 600-1200 jobs per second
Benchmark setup:
Create 2400 identical audit log entries (type THRIFT_ALTER_TABLE, creates a COPY_UNPARTITIONED_TABLE operation which ends up being NOT_COMPLETABLE after execution), with corresponding INPUT and OUTPUT objects. Run Reair until it says Sleeping for 10000ms because no more entries
. Clear replication_jobs table and audit log counter, repeat.
For context, we could process all of last month's events (all converted to non-noop operations) in a few hours
@plypaul ptal
LGTM aside from last comment.
This publishes create statements in the main loop as batch statements. Tests and checkstyle WIP but all the existing tests pass locally.
It does indeed benchmark well.