Closed JigaoLuo closed 2 years ago
std::allocator
~/jigao/duckdb3/third_party/ART$ g++ -O3 -o ART ART.cpp
~/jigao/duckdb3/third_party/ART$ ./ART 10000000 0
Node4 Size: 56
Node16 Size: 160
Node48 Size: 656
Node256 Size: 2064
insert,10000000,46.728975
cycles, instructions, L1-misses, LLC-misses, branch-misses, task-clock, scale, IPC, CPUs, GHz
68.68, 200.81, 0.32, 0.15, 0.03, 19.12, 10000000, 2.92, 1.00, 3.59
lookup,10000000,94.083193
MallocAllocator
~/jigao/duckdb3/third_party/ART$ g++ -O3 -o ART ART.cpp
~/jigao/duckdb3/third_party/ART$ ./ART 10000000 0
Node4 Size: 56
Node16 Size: 160
Node48 Size: 656
Node256 Size: 2064
insert,10000000,45.377185
cycles, instructions, L1-misses, LLC-misses, branch-misses, task-clock, scale, IPC, CPUs, GHz
70.60, 201.07, 0.32, 0.15, 0.03, 19.66, 10000000, 2.85, 1.00, 3.59
lookup,10000000,94.054711
PoolAllocator
~/jigao/duckdb3/third_party/ART$ g++ -O3 -o ART ART.cpp
~/jigao/duckdb3/third_party/ART$ ./ART 10000000 0
Node4 Size: 56
Node16 Size: 160
Node48 Size: 656
Node256 Size: 2064
insert,10000000,46.660719
cycles, instructions, L1-misses, LLC-misses, branch-misses, task-clock, scale, IPC, CPUs, GHz
69.38, 196.73, 0.32, 0.15, 0.02, 19.32, 10000000, 2.84, 1.00, 3.59
lookup,10000000,87.280597
MemoryPool
~/jigao/duckdb3/third_party/ART$ g++ -O3 -o ART ART.cpp
~/jigao/duckdb3/third_party/ART$ ./ART 10000000 0
Node4 Size: 56
Node16 Size: 160
Node48 Size: 656
Node256 Size: 2064
insert,10000000,45.493587
cycles, instructions, L1-misses, LLC-misses, branch-misses, task-clock, scale, IPC, CPUs, GHz
71.29, 195.52, 0.37, 0.15, 0.03, 19.85, 10000000, 2.74, 1.00, 3.59
lookup,10000000,91.222563
Valuebale Readings:
The success or failure of huge page allocation depends on the amount of physically contiguous memory that is present in system at the time of the allocation attempt. If the kernel is unable to allocate huge pages from some nodes in a NUMA system, it will attempt to make up the difference by allocating extra pages on other nodes with sufficient available contiguous memory, if any.
Other helpful posts:
Example Programs:
mmap_allcator
mmap_allocator<uint8_t, page_type::huge_2mb, 0> allocator;
with dummy bookkeepingdummy bookkeeping
: just allocate new page, if the previous allocated page is used up or not large enough for the new node.
$ echo 100 | sudo tee /proc/sys/vm/nr_hugepages
100
$ grep Huge /proc/meminfo
AnonHugePages: 2048 kB
ShmemHugePages: 0 kB
HugePages_Total: 100000
HugePages_Free: 100000
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
$ g++ ART.cpp -o ARTnew -O3 -lnuma
$ $ ./ARTnew 10000000 0
Node4 Size: 56
Node16 Size: 160
Node48 Size: 656
Node256 Size: 2064
insert,10000000,41.141602
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock, scale, IPC, CPUs, GHz
64.51, 185.81, 0.48, 0.01, 0.00, 0.00, 0.03, 23.76, 10000000, 2.88, 1.00, 2.71
lookup,10000000,102.682025
cycles, instructions, L1-misses, LLC-misses, dTLB-load-misses, dTLB-store-misses, branch-misses, task-clock, scale, IPC, CPUs, GHz
31.82, 124.97, 0.26, 0.01, 0.00, 0.00, 0.00, 9.73, 10000000, 3.93, 1.00, 3.27
Insertion with performance improvements. But overall speed-up is less than 10 %.
In this benchmark, I have tested the ART with 10M insertions.
Original Prof. Leis's ART