Closed JigaoLuo closed 2 years ago
Add std::random_shuffle(keys,keys+n);
after insertion but before lookup.
$ ./ART 1000000 0
insert,1000000,34.385460
lookup,1000000,36.277478
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock, scale, IPC, CPUs, GHz
78.35, 133.04, 3.28, 0.00, 0.61, 0.00, 0.00, 27.57, 10000000, 1.70, 1.00, 2.84
erase,1000000,26.128014
$ ./ART 1000000 0
insert,1000000,40.746716
lookup,1000000,40.463219
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock, scale, IPC, CPUs, GHz
79.78, 133.06, 3.27, 0.00, 0.61, 0.00, 0.00, 24.72, 10000000, 1.67, 1.00, 3.23
erase,1000000,26.143812
$ ./ART 1000000 0
insert,1000000,41.699515
lookup,1000000,40.594295
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock, scale, IPC, CPUs, GHz
80.67, 133.12, 3.27, 0.00, 0.61, 0.00, 0.00, 24.64, 10000000, 1.65, 1.00, 3.27
erase,1000000,26.021354
$ ./ART 1000000 0
insert,1000000,40.159170
lookup,1000000,40.660329
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock, scale, IPC, CPUs, GHz
80.56, 133.12, 3.27, 0.00, 0.61, 0.00, 0.00, 24.60, 10000000, 1.65, 1.00, 3.28
erase,1000000,25.355023
$ ./ART 1000000 0
insert,1000000,41.596953
lookup,1000000,38.945339
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock, scale, IPC, CPUs, GHz
82.90, 132.95, 3.25, 0.00, 0.61, 0.00, 0.00, 25.68, 10000000, 1.60, 1.00, 3.23
erase,1000000,26.452305
$ ./ART 10000000 0
insert,10000000,45.747144
lookup,10000000,15.015195
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock, scale, IPC, CPUs, GHz
218.29, 123.23, 4.69, 0.85, 1.28, 0.00, 0.00, 66.60, 10000000, 0.56, 1.00, 3.28
erase,10000000,11.620084
$ ./ART 10000000 0
insert,10000000,48.795942
lookup,10000000,14.929829
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock, scale, IPC, CPUs, GHz
219.55, 123.23, 4.69, 0.85, 1.28, 0.00, 0.00, 66.99, 10000000, 0.56, 1.00, 3.28
erase,10000000,12.821383
$ ./ART 10000000 0
insert,10000000,49.360548
lookup,10000000,15.160818
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock, scale, IPC, CPUs, GHz
216.15, 123.26, 4.70, 0.85, 1.28, 0.00, 0.00, 65.96, 10000000, 0.57, 1.00, 3.28
erase,10000000,12.787384
$ ./ART 10000000 0
insert,10000000,49.634678
lookup,10000000,14.925770
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock, scale, IPC, CPUs, GHz
218.62, 123.23, 4.67, 0.85, 1.28, 0.00, 0.00, 67.00, 10000000, 0.56, 1.00, 3.26
erase,10000000,12.768698
Baseline
This is the original Prof. Leis' ART implementation: https://github.com/cakebytheoceanLuo/duckdb/blob/cb5c74236b731ce73ff90af9b213d5cd04b40171/third_party/ART/unchanged/ART.cpp
The only changed taken is benchmarking the
lookup
function around theperfevent
header.Runtime of Dense Sorted Version
Dense Sorted Version
:1,2,3,..., N
is inserted into the ART1,2,3,..., N
is looked up1M
10M
Issue
I think the insertion with a dense sorted sequence as
1,2,3,..., N
is not a problem. However, the lookup with the same sequence is cheering on caching:1,2,3,4
are fouruint64_t
differing only at the last byte.1
, the lookup with2,3,4
could benefit from the cache hit, which is fetched during function call with1
.