RWSet performance fixes, LockFreeArray performance fixes and tests

mbutrovich commented 6 years ago

LockFreeArray:

Changed functions that unconditionally returned true to be void.
Rewrote FindValid to no longer be O(n) lookups. This is used in DataTable:GetTileGroup() and should be fast.
Added Doxygen comments.
Added more tests.

TimestampOrderingTransactionManger:

Keep track of last accessed tile_group_id and if it's the same avoid going to the StorageManager’s CuckooHashMap for the TileGroup.

TransactionContext:

Simplified conditionals to what should be equivalent logic. Reduced lookups for Release mode, increased lookups in Debug mode (by moving lookups that only served for PELOTON_ASSERTs into individual lookups inside the ASSERTs)
Removed the iterator hints since Intel docs show they're not used: https://software.intel.com/en-us/node/506175
Removed iswritten and insertcount members and related logic since they were not used for anything.

@pervazea is going to help me out and grab callgrind numbers when he has time to demonstrate these improvements.

mbutrovich commented 6 years ago

@pervazea ran a microbenchmark of this branch against master that showed a drop in calls to StorageManager::GetTileGroup (and thereby CuckooMap::Find) from 1229 to 1083.

mbutrovich commented 6 years ago

@poojanilangekar Regarding comment 2, there are no tests for transaction_context and for TimestampOrderingTransactionManager, well... :(

Maybe I can adapt the tests I wrote for GC fixes to have EXPECTs that test TOTM. It would be helpful to have some sort of baselines if we're making changes to TOTM's guts.

For performance, our numbers still seem way too variable to take away anything meaningful, but I'll see if a bunch of runs will smooth that out.

coveralls commented 6 years ago

Coverage increased (+0.07%) to 77.029% when pulling 21253c468d6a5854e7a48e5e9b0d79b6662f05f4 on mbutrovich:friday_night into 308a6691b7a939e5391d5648e79d9c987ca848c2 on cmu-db:master.

mbutrovich commented 6 years ago

Like I mentioned, hard to take away too much from oltpbench runs right now due to variability. I ran master and the friday_night branch with the attached configs on my laptop:

TPC-C (scale factor 4, 4 terminals, 60 seconds, repeated 10 times): master: mu: 334.63, sigma: 15.48 friday_night: mu: 347.39, sigma: 40.44

YCSB (scale factor 1000, 4 terminals, read only, 60 seconds, repeated 10 times): master: mu: 16671.82, sigma: 67.89 friday_night: mu: 16669.69, sigma: 92.82

oltpbench_configs.zip

mbutrovich commented 6 years ago

I also did some sampling with dtrace, configs from the previous comment:

TPC-C

Sampled calls to StorageManager::GetTileGroup (cuckoo hash lookups) from CommitTransaction: master: 7317 friday_night: 6880

Sampled calls to tbb::internal_find: master: 8032 friday_night: 7815

YCSB read-only

Sampled calls to StorageManager::GetTileGroup (cuckoo hash lookups) from CommitTransaction: master: 2475 friday_night: 2266

Sampled calls to tbb::internal_find: master: 6331 friday_night: 6070

poojanilangekar commented 6 years ago

@mbutrovich Do you have an idea about why the performance of this branch has higher variance?

Yes, it would be great if you could add a couple of tests, to the TimestampOrderingTransactionManager.

mbutrovich commented 6 years ago

@poojanilangekar Luck of the draw with Peloton and oltpbench, really. Also this is on my laptop where I don't have a ton of control over background tasks. I ran them again today:

TPC-C (scale factor 4, 4 terminals, 60 seconds, repeated 10 times): master: mu: 332.59, sigma: 11.98 friday_night: mu: 333.06, sigma: 13.33

YCSB (scale factor 1000, 4 terminals, read only, 60 seconds, repeated 10 times): master: mu: 16689.99, sigma: 127.87 friday_night: mu: 16706.10, sigma: 122.95

Again, still tough to take too much away from oltpbench right now. Regarding tests for TOTM, not sure this is the PR for it.

I'll put the iswritten flag back.

tomasic commented 6 years ago

might help to get the abort rate reported also ...

On Wed, Jun 13, 2018 at 3:13 PM Matt Butrovich notifications@github.com wrote:

@poojanilangekar https://github.com/poojanilangekar Luck of the draw with Peloton and oltpbench, really. Also this is on my laptop where I don't have a ton of control over background tasks. I ran them again today:

TPC-C (scale factor 4, 4 terminals, 60 seconds, repeated 10 times): master: mu: 332.59, sigma: 11.98 friday_night: mu: 333.06, sigma: 13.33

YCSB (scale factor 1000, 4 terminals, read only, 60 seconds, repeated 10 times): master: mu: 16689.99, sigma: 127.87 friday_night: mu: 16706.10, sigma: 122.95

Again, still tough to take too much away from oltpbench right now. Regarding tests for TOTM, not sure this is the PR for it.

I'll put the iswritten flag back.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cmu-db/peloton/pull/1401#issuecomment-397070355, or mute the thread https://github.com/notifications/unsubscribe-auth/ABS8HLEXnUVFldlKELXUud4RJWi6A6BWks5t8XKBgaJpZM4UjN9u .

-- Anthony Tomasic Language Technologies Institute Carnegie Mellon University http://www.tiramisutransit.com http://mcds.cs.cmu.edu http://mcds.cs.cmu.edu http://www.cs.cmu.edu/~tomasic

poojanilangekar commented 6 years ago

LGTM. I think this should be merged in once the build passes.

mbutrovich commented 6 years ago

@tomasic I just ran dtrace to approximate abort rates (never tried Peloton's internal stats, and not sure how much they slow the system down). dtrace dropped throughput by ~10%, but:

TPC-C: CommitTransaction() samples: 27261 AbortTransaction() samples: 1

YCSB: CommitTransaction() samples: 17997 AbortTransaction() samples: 0

It doesn't seem like aborts are an issue, at least under these oltpbench configs on my laptop.

tomasic commented 6 years ago

Thanks - i just wondered because we are off by a factor of 100 for tpc-c and the function tracing stuff isn’t revealing obvious holes. So now I suspect locks and latches. Anthony

On Wed, Jun 13, 2018 at 4:28 PM Matt Butrovich notifications@github.com wrote:

@tomasic https://github.com/tomasic I just ran dtrace to approximate abort rates (never tried Peloton's internal stats, and not sure how much they slow the system down). dtrace dropped throughput by ~10%, but:

TPC-C: CommitTransaction() samples: 27261 AbortTransaction() samples: 1

YCSB: CommitTransaction() samples: 17997 AbortTransaction() samples: 0

It doesn't seem like aborts are an issue, at least under these oltpbench configs on my laptop.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cmu-db/peloton/pull/1401#issuecomment-397093420, or mute the thread https://github.com/notifications/unsubscribe-auth/ABS8HOMB325becreskT-yyYzLfN1TOFBks5t8YQYgaJpZM4UjN9u .

-- Anthony Tomasic Language Technologies Institute Carnegie Mellon University http://www.tiramisutransit.com http://mcds.cs.cmu.edu http://mcds.cs.cmu.edu http://www.cs.cmu.edu/~tomasic

cmu-db / peloton

RWSet performance fixes, LockFreeArray performance fixes and tests #1401