Closed stephenswat closed 1 month ago
Performance drops a fair bit for low allocation sizes and increases for larger ones:
Before:
BenchmarkBinaryPage/1 12.6 ns 12.5 ns 54232577
BenchmarkBinaryPage/2 12.4 ns 12.4 ns 56375264
BenchmarkBinaryPage/4 12.8 ns 12.8 ns 54276274
BenchmarkBinaryPage/8 13.2 ns 13.2 ns 53022963
BenchmarkBinaryPage/16 13.1 ns 13.1 ns 52296277
BenchmarkBinaryPage/32 12.8 ns 12.8 ns 54326859
BenchmarkBinaryPage/64 13.5 ns 13.5 ns 51564964
BenchmarkBinaryPage/128 13.7 ns 13.7 ns 50968182
BenchmarkBinaryPage/256 14.1 ns 14.1 ns 49778321
BenchmarkBinaryPage/512 14.5 ns 14.5 ns 48230900
BenchmarkBinaryPage/1024 14.9 ns 14.9 ns 46971943
BenchmarkBinaryPage/2048 14.9 ns 14.9 ns 46973955
BenchmarkBinaryPage/4096 15.0 ns 15.0 ns 46759439
BenchmarkBinaryPage/8192 15.0 ns 15.0 ns 46865474
BenchmarkBinaryPage/16384 14.9 ns 14.9 ns 47017818
BenchmarkBinaryPage/32768 14.9 ns 14.9 ns 47075989
BenchmarkBinaryPage/65536 13.4 ns 13.4 ns 46950947
BenchmarkBinaryPage/131072 12.8 ns 12.8 ns 54544558
BenchmarkBinaryPage/262144 12.8 ns 12.8 ns 54600546
BenchmarkBinaryPage/524288 12.8 ns 12.8 ns 54622645
BenchmarkBinaryPage/1048576 13.2 ns 13.2 ns 53119672
BenchmarkBinaryPage/2097152 13.6 ns 13.6 ns 51571160
BenchmarkBinaryPage/4194304 13.6 ns 13.6 ns 51437233
BenchmarkBinaryPage/8388608 14.0 ns 14.0 ns 50019719
BenchmarkBinaryPage/16777216 13.9 ns 13.9 ns 50611850
BenchmarkBinaryPage/33554432 13.6 ns 13.6 ns 52128892
BenchmarkBinaryPage/67108864 11.4 ns 11.4 ns 50334402
BenchmarkBinaryPage/134217728 11.1 ns 11.1 ns 63165162
BenchmarkBinaryPage/268435456 11.5 ns 11.5 ns 60714333
BenchmarkBinaryPage/536870912 11.5 ns 11.5 ns 60770764
BenchmarkBinaryPage/1073741824 11.9 ns 11.9 ns 58579526
BenchmarkBinaryPage/2147483648 12.2 ns 12.2 ns 55850644
BenchmarkBinaryPage/4294967296 12.4 ns 12.4 ns 54779783
After:
-------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------
BenchmarkBinaryPage/1 167 ns 167 ns 4104743
BenchmarkBinaryPage/2 169 ns 169 ns 4009150
BenchmarkBinaryPage/4 170 ns 170 ns 3971936
BenchmarkBinaryPage/8 169 ns 169 ns 4094620
BenchmarkBinaryPage/16 171 ns 171 ns 3965085
BenchmarkBinaryPage/32 170 ns 170 ns 4066688
BenchmarkBinaryPage/64 170 ns 170 ns 3978674
BenchmarkBinaryPage/128 176 ns 176 ns 4061772
BenchmarkBinaryPage/256 178 ns 178 ns 4004071
BenchmarkBinaryPage/512 186 ns 186 ns 3645495
BenchmarkBinaryPage/1024 181 ns 181 ns 3873270
BenchmarkBinaryPage/2048 180 ns 180 ns 3828041
BenchmarkBinaryPage/4096 185 ns 185 ns 3845810
BenchmarkBinaryPage/8192 111 ns 110 ns 6328662
BenchmarkBinaryPage/16384 66.1 ns 66.1 ns 10167867
BenchmarkBinaryPage/32768 45.7 ns 45.7 ns 15481878
BenchmarkBinaryPage/65536 33.4 ns 33.4 ns 20748067
BenchmarkBinaryPage/131072 24.8 ns 24.8 ns 28547759
BenchmarkBinaryPage/262144 18.4 ns 18.4 ns 38280071
BenchmarkBinaryPage/524288 14.2 ns 14.2 ns 48774345
BenchmarkBinaryPage/1048576 10.2 ns 10.2 ns 69163315
BenchmarkBinaryPage/2097152 10.4 ns 10.4 ns 67143905
BenchmarkBinaryPage/4194304 10.5 ns 10.5 ns 66245786
BenchmarkBinaryPage/8388608 10.7 ns 10.7 ns 64700648
BenchmarkBinaryPage/16777216 11.1 ns 11.1 ns 64294237
BenchmarkBinaryPage/33554432 11.3 ns 11.3 ns 61304237
BenchmarkBinaryPage/67108864 12.2 ns 12.2 ns 49836666
Note that this benchmark is very antithetical to how normal allocations go, because it constantly allocates a piece of memory and then deallocates it; i.e. no more than 1 allocation ever lives at the same time. Real world performance should be much less impacted.
This commit makes two main changes to improve the performance of the binary page memory resource.
Firstly, it makes the allocator more eager to unsplit (i.e. merge) pages so that they can be reused for new allocation, which should reduce the memory footprint of when using very large allocations.
Secondly, it sets a minimum size of allocations per superpage, which depends on the size of the superpage. This prevents very small allocations from ending up in massive blocks (increasing fragmentation) and also decreases the total number of pages (increasing performance).