Test whether disk-oriented concurrency control algorithms would scale on extreme many-core chips that enables running hundreds of threads in parallel.
Main Finding
TL;DR: All current concurrency control suffers from different issues with extreme parallelism.
OCC has high abort cost because it can only abort after doing all the work.
Main issues are : (1) Lock trashing, i.e. waiting for lock, (2) preemptive aborts, (3) deadlocks, (4) timestamp allocation, (5) memory-to-memory copying.
Read-Only Workload
Timestamp allocation become the bottleneck for WAIT_DIE and MVCC.
OCC and TIMESTAMP suffers from needing to copy to private workspace.
Write-Intensive Workload
medium-contention
high-contention
2PL schemes die of lock trashing in high-contention workloads
DL_DETECT succumbs to waiting for the deadlock detection thread to come by and clean up.
NO_WAIT and WAIT_DIE has a high abort rate, but perform best in medium contention because restarting transactions has low overhead in the experiment (due to stored procedures and only affecting 1 table)
Lock-free algorithms generally perform better in high-contention scenarios.
Although OCC performs worst with 1 core, it performs best in high-contention (when execution degenerates into serial) because its validation phase always allow 1 transaction to pass
System used
A custom main memory OLTP engine with pluggable lock manager running on Graphite, a CPU simulator (single-socket tile-based CPU with shared L2 cache communicating over 2D mesh network). All transactions are using stored procedures to remove client network latency.
7 Concurrency Control are analysed (3 lock-based, 4 lock-free)
2PL with Lock Detection (DL_DETECT)
2PL with Non-waiting Deadlock Prevention (NO_WAIT)
2PL with Waiting Deadlock Prevention (WAIT_DIE)
Basic T/O (TIMESTAMP)
Multi-version Concurrency Control (MVCC)
Optimistic Concurrency Control (OCC)
T/O with Partition-level Locking (H-STORE)
Workloads Evaluated
YCSB (Yahoo! Cloud Serving Benchmark) with variation in contention:
Uniform Access
Medium contention: ~40% of all transactions access 10% of the tuples
High contention: ~60% of all transactions access 10% of the tuples.
TPC-C using only Payment and NewOrder transactions.
Main Idea
Test whether disk-oriented concurrency control algorithms would scale on extreme many-core chips that enables running hundreds of threads in parallel.
Main Finding
TL;DR: All current concurrency control suffers from different issues with extreme parallelism.
OCC
has high abort cost because it can only abort after doing all the work.Read-Only Workload
WAIT_DIE
andMVCC
.OCC
andTIMESTAMP
suffers from needing to copy to private workspace.Write-Intensive Workload
DL_DETECT
succumbs to waiting for the deadlock detection thread to come by and clean up.NO_WAIT
andWAIT_DIE
has a high abort rate, but perform best in medium contention because restarting transactions has low overhead in the experiment (due to stored procedures and only affecting 1 table)OCC
performs worst with 1 core, it performs best in high-contention (when execution degenerates into serial) because its validation phase always allow 1 transaction to passSystem used
A custom main memory OLTP engine with pluggable lock manager running on Graphite, a CPU simulator (single-socket tile-based CPU with shared L2 cache communicating over 2D mesh network). All transactions are using stored procedures to remove client network latency.
7 Concurrency Control are analysed (3 lock-based, 4 lock-free)
DL_DETECT
)NO_WAIT
)WAIT_DIE
)TIMESTAMP
)MVCC
)OCC
)H-STORE
)Workloads Evaluated
Payment
andNewOrder
transactions.