JigaoLuo / duckdb

DuckDB is an in-process SQL OLAP Database Management System
http://www.duckdb.org
MIT License
2 stars 0 forks source link

Benchmark Issue: A Randomized OR Sorted Dataset for Lookup Function #41

Closed JigaoLuo closed 2 years ago

JigaoLuo commented 2 years ago

Baseline

This is the original Prof. Leis' ART implementation: https://github.com/cakebytheoceanLuo/duckdb/blob/cb5c74236b731ce73ff90af9b213d5cd04b40171/third_party/ART/unchanged/ART.cpp

The only changed taken is benchmarking the lookup function around the perfevent header.

Runtime of Dense Sorted Version

Dense Sorted Version:

1M

$ g++ -O3 -o ART ART.cpp -march=native
$ [zilliz@zqm10603ph05 unchanged]$ ./ART 1000000 0
insert,1000000,34.943215
lookup,1000000,67.938034
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock,    scale,  IPC, CPUs,  GHz 
 36.04,       135.23,      0.26,       0.00,             0.00,              0.00,          0.00,      14.72, 10000000, 3.75, 1.00, 2.45 
erase,1000000,39.652700
$ ./ART 1000000 0
insert,1000000,33.556043
lookup,1000000,68.217036
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock,    scale,  IPC, CPUs,  GHz 
 35.87,       135.05,      0.26,       0.00,             0.00,              0.00,          0.00,      14.66, 10000000, 3.76, 1.00, 2.45 
erase,1000000,39.373154
$ ./ART 1000000 0
insert,1000000,41.628743
lookup,1000000,91.032297
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock,    scale,  IPC, CPUs,  GHz 
 35.95,       134.98,      0.25,       0.00,             0.00,              0.00,          0.00,      10.99, 10000000, 3.75, 1.00, 3.27 
erase,1000000,48.367725
$ ./ART 1000000 0
insert,1000000,41.417862
lookup,1000000,91.094576
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock,    scale,  IPC, CPUs,  GHz 
 35.92,       134.93,      0.25,       0.00,             0.00,              0.00,          0.00,      10.98, 10000000, 3.76, 1.00, 3.27 
erase,1000000,48.894920
$ ./ART 1000000 0
insert,1000000,41.194535
lookup,1000000,90.985299
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock,    scale,  IPC, CPUs,  GHz 
 35.94,       134.94,      0.25,       0.00,             0.00,              0.00,          0.00,      10.99, 10000000, 3.76, 1.00, 3.27 
erase,1000000,48.520476

10M

$ ./ART 10000000 0
insert,10000000,45.641459
lookup,10000000,101.238085
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock,    scale,  IPC, CPUs,  GHz 
 32.39,       124.95,      0.25,       0.01,             0.00,              0.00,          0.00,       9.88, 10000000, 3.86, 1.00, 3.28 
erase,10000000,45.149999
$ ./ART 10000000 0
insert,10000000,49.900706
lookup,10000000,101.443518
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock,    scale,  IPC, CPUs,  GHz 
 32.28,       124.93,      0.25,       0.01,             0.00,              0.00,          0.00,       9.86, 10000000, 3.87, 1.00, 3.27 
erase,10000000,49.652893
$ ./ART 10000000 0
insert,10000000,49.645018
lookup,10000000,101.324173
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock,    scale,  IPC, CPUs,  GHz 
 32.31,       124.87,      0.25,       0.01,             0.00,              0.00,          0.00,       9.88, 10000000, 3.86, 1.00, 3.27 
erase,10000000,45.305428
$ ./ART 10000000 0
insert,10000000,49.613952
lookup,10000000,98.136478
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock,    scale,  IPC, CPUs,  GHz 
 32.48,       125.20,      0.26,       0.01,             0.00,              0.00,          0.00,      10.20, 10000000, 3.85, 1.00, 3.19 
erase,10000000,45.345298
$ ./ART 10000000 0
insert,10000000,49.255976
lookup,10000000,101.340575
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock,    scale,  IPC, CPUs,  GHz 
 32.37,       125.00,      0.25,       0.01,             0.00,              0.00,          0.00,       9.87, 10000000, 3.86, 1.00, 3.28 
erase,10000000,47.789731

Issue

I think the insertion with a dense sorted sequence as 1,2,3,..., N is not a problem. However, the lookup with the same sequence is cheering on caching:

JigaoLuo commented 2 years ago

Randomized Lookup Dataset

Add std::random_shuffle(keys,keys+n); after insertion but before lookup.

Runtime of Randomized Version

1M

$ ./ART 1000000 0
insert,1000000,34.385460
lookup,1000000,36.277478
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock,    scale,  IPC, CPUs,  GHz 
 78.35,       133.04,      3.28,       0.00,             0.61,              0.00,          0.00,      27.57, 10000000, 1.70, 1.00, 2.84 
erase,1000000,26.128014
$ ./ART 1000000 0
insert,1000000,40.746716
lookup,1000000,40.463219
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock,    scale,  IPC, CPUs,  GHz 
 79.78,       133.06,      3.27,       0.00,             0.61,              0.00,          0.00,      24.72, 10000000, 1.67, 1.00, 3.23 
erase,1000000,26.143812
$ ./ART 1000000 0
insert,1000000,41.699515
lookup,1000000,40.594295
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock,    scale,  IPC, CPUs,  GHz 
 80.67,       133.12,      3.27,       0.00,             0.61,              0.00,          0.00,      24.64, 10000000, 1.65, 1.00, 3.27 
erase,1000000,26.021354
$ ./ART 1000000 0
insert,1000000,40.159170
lookup,1000000,40.660329
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock,    scale,  IPC, CPUs,  GHz 
 80.56,       133.12,      3.27,       0.00,             0.61,              0.00,          0.00,      24.60, 10000000, 1.65, 1.00, 3.28 
erase,1000000,25.355023
$ ./ART 1000000 0
insert,1000000,41.596953
lookup,1000000,38.945339
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock,    scale,  IPC, CPUs,  GHz 
 82.90,       132.95,      3.25,       0.00,             0.61,              0.00,          0.00,      25.68, 10000000, 1.60, 1.00, 3.23 
erase,1000000,26.452305

10M

$ ./ART 10000000 0
insert,10000000,45.747144
lookup,10000000,15.015195
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock,    scale,  IPC, CPUs,  GHz 
218.29,       123.23,      4.69,       0.85,             1.28,              0.00,          0.00,      66.60, 10000000, 0.56, 1.00, 3.28 
erase,10000000,11.620084
$ ./ART 10000000 0
insert,10000000,48.795942
lookup,10000000,14.929829
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock,    scale,  IPC, CPUs,  GHz 
219.55,       123.23,      4.69,       0.85,             1.28,              0.00,          0.00,      66.99, 10000000, 0.56, 1.00, 3.28 
erase,10000000,12.821383
$ ./ART 10000000 0
insert,10000000,49.360548
lookup,10000000,15.160818
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock,    scale,  IPC, CPUs,  GHz 
216.15,       123.26,      4.70,       0.85,             1.28,              0.00,          0.00,      65.96, 10000000, 0.57, 1.00, 3.28 
erase,10000000,12.787384
$ ./ART 10000000 0
insert,10000000,49.634678
lookup,10000000,14.925770
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock,    scale,  IPC, CPUs,  GHz 
218.62,       123.23,      4.67,       0.85,             1.28,              0.00,          0.00,      67.00, 10000000, 0.56, 1.00, 3.26 
erase,10000000,12.768698
JigaoLuo commented 2 years ago

Conclusion