Kleidukos / popcount-benchmark

This benchmark suite tests two popcount implementations: in Haskell and C
MIT License
1 stars 1 forks source link

Results #1

Open bgamari opened 5 months ago

bgamari commented 5 months ago

All using GHC 9.6.5 on Linux with my branch (in particular bgamari/popcount-benchmark@46f47df6d40eb4121784dfc4a8f2585c7567d5b4), which has a few important fixes and a more efficient, word-wise implementation (popcount2). All C compilation performed with nixpkgs' default gcc configuration. Invoked via:

nix run github:bgamari/popcount-benchmark# -- --csv out; cat out

Note that this configuration does not enable use of the native x86-64 popcount instructions provided by SSE4.2; see #2 for such results.

On an older Devil's Canyon (i7-4790K)

Name,Mean (ps),2*Stdev (ps)
All.Benchmark 16.ByteString.foldl,33310,1322
All.Benchmark 16.FFI popcount (capi),94983,5178
All.Benchmark 16.FFI popcount (ccall),94202,5258
All.Benchmark 16.FFI popcount2,78357,5156
All.Benchmark 256.ByteString.foldl,494165,41122
All.Benchmark 256.FFI popcount (capi),648616,43400
All.Benchmark 256.FFI popcount (ccall),647108,41392
All.Benchmark 256.FFI popcount2,147526,10534
All.Benchmark 1024.ByteString.foldl,1948046,163776
All.Benchmark 1024.FFI popcount (capi),2408375,163802
All.Benchmark 1024.FFI popcount (ccall),2407413,164572
All.Benchmark 1024.FFI popcount2,370689,21054
All.Benchmark 16384.ByteString.foldl,30915076,2708396
All.Benchmark 16384.FFI popcount (capi),37605988,2656072
All.Benchmark 16384.FFI popcount (ccall),37560786,2644448
All.Benchmark 16384.FFI popcount2,4773387,333758
All.Benchmark 1048576.ByteString.foldl,1984600175,174262198
All.Benchmark 1048576.FFI popcount (capi),2398511900,169998800
All.Benchmark 1048576.FFI popcount (ccall),2398251275,172648428
All.Benchmark 1048576.FFI popcount2,300430825,24504306

On a newish Ryzen 5900X

Name,Mean (ps),2*Stdev (ps)
All.Benchmark 16.ByteString.foldl,24884,2048
All.Benchmark 16.FFI popcount (capi),65476,3416
All.Benchmark 16.FFI popcount (ccall),69588,5796
All.Benchmark 16.FFI popcount2,59960,5120
All.Benchmark 256.ByteString.foldl,339072,26844
All.Benchmark 256.FFI popcount (capi),449081,11434
All.Benchmark 256.FFI popcount (ccall),448942,41158
All.Benchmark 256.FFI popcount2,104780,6554
All.Benchmark 1024.ByteString.foldl,1364193,100470
All.Benchmark 1024.FFI popcount (capi),1660103,85260
All.Benchmark 1024.FFI popcount (ccall),1656702,91594
All.Benchmark 1024.FFI popcount2,259237,22396
All.Benchmark 16384.ByteString.foldl,21603074,1874374
All.Benchmark 16384.FFI popcount (capi),25740900,1386094
All.Benchmark 16384.FFI popcount (ccall),25709884,1456920
All.Benchmark 16384.FFI popcount2,3280318,165842
All.Benchmark 1048576.ByteString.foldl,1369632831,84915966
All.Benchmark 1048576.FFI popcount (capi),1645607600,107445246
All.Benchmark 1048576.FFI popcount (ccall),1649083025,85288300
All.Benchmark 1048576.FFI popcount2,206759292,10763200

On an older Sandy Bridge EP (Xeon E5-2690)

Name,Mean (ps),2*Stdev (ps)
All.Benchmark 16.ByteString.foldl,42285,3750
All.Benchmark 16.FFI popcount (capi),123091,11722
All.Benchmark 16.FFI popcount (ccall),117173,10434
All.Benchmark 16.FFI popcount2,99464,8774
All.Benchmark 256.ByteString.foldl,604927,43382
All.Benchmark 256.FFI popcount (capi),816863,42232
All.Benchmark 256.FFI popcount (ccall),813070,47250
All.Benchmark 256.FFI popcount2,183942,11860
All.Benchmark 1024.ByteString.foldl,2400691,169226
All.Benchmark 1024.FFI popcount (capi),3027307,173278
All.Benchmark 1024.FFI popcount (ccall),3022326,176642
All.Benchmark 1024.FFI popcount2,463366,40944
All.Benchmark 16384.ByteString.foldl,40930518,1521714
All.Benchmark 16384.FFI popcount (capi),47223733,2667580
All.Benchmark 16384.FFI popcount (ccall),47431973,3481574
All.Benchmark 16384.FFI popcount2,6035714,415498
All.Benchmark 1048576.ByteString.foldl,2465708362,186583828
All.Benchmark 1048576.FFI popcount (capi),3015374062,171832040
All.Benchmark 1048576.FFI popcount (ccall),3011936450,174357232
All.Benchmark 1048576.FFI popcount2,378817585,25954390

On a rather noisy Ryzen 7 7840U

Name,Mean (ps),2*Stdev (ps)
All.Benchmark 16.ByteString.foldl,24201,228
All.Benchmark 16.FFI popcount (capi),64189,6160
All.Benchmark 16.FFI popcount (ccall),63033,5640
All.Benchmark 16.FFI popcount2,56593,5208
All.Benchmark 256.ByteString.foldl,341058,33290
All.Benchmark 256.FFI popcount (capi),421699,31634
All.Benchmark 256.FFI popcount (ccall),414042,20932
All.Benchmark 256.FFI popcount2,95236,5196
All.Benchmark 1024.ByteString.foldl,1332139,67838
All.Benchmark 1024.FFI popcount (capi),1565694,87146
All.Benchmark 1024.FFI popcount (ccall),1563094,82818
All.Benchmark 1024.FFI popcount2,216357,16792
All.Benchmark 16384.ByteString.foldl,21515713,1940358
All.Benchmark 16384.FFI popcount (capi),24446676,1335834
All.Benchmark 16384.FFI popcount (ccall),24803205,1833290
All.Benchmark 16384.FFI popcount2,2701783,148042
All.Benchmark 1048576.ByteString.foldl,1367534931,86622952
All.Benchmark 1048576.FFI popcount (capi),1621758865,59621900
All.Benchmark 1048576.FFI popcount (ccall),1597307831,150902192
All.Benchmark 1048576.FFI popcount2,171874521,13946010
bgamari commented 5 months ago

Conclusions: