Kleidukos / popcount-benchmark

This benchmark suite tests two popcount implementations: in Haskell and C
MIT License
1 stars 1 forks source link

Clang x86-64-v2 results #3

Open bgamari opened 4 months ago

bgamari commented 4 months ago

Invoked using

nix run github:bgamari/popcount-benchmark# -- --csv out; cat out

after adding -pgmc $CLANG (with clang-17 from nixpkgs) to the cabal file.

On an older Devil's Canyon (i7-4790K)

Name,Mean (ps),2*Stdev (ps)
All.Benchmark 16.ByteString.foldl,24796,1280
All.Benchmark 16.FFI popcount (capi),65685,3120
All.Benchmark 16.FFI popcount (ccall),65599,6376
All.Benchmark 16.FFI popcount2,64623,6042
All.Benchmark 256.ByteString.foldl,429439,42784
All.Benchmark 256.FFI popcount (capi),130775,11182
All.Benchmark 256.FFI popcount (ccall),132118,13182
All.Benchmark 256.FFI popcount2,80525,5294
All.Benchmark 1024.ByteString.foldl,1718567,171170
All.Benchmark 1024.FFI popcount (capi),319458,11102
All.Benchmark 1024.FFI popcount (ccall),317096,24974
All.Benchmark 1024.FFI popcount2,139502,10400
All.Benchmark 16384.ByteString.foldl,27378187,1332238
All.Benchmark 16384.FFI popcount (capi),3843825,337048
All.Benchmark 16384.FFI popcount (ccall),3848607,353040
All.Benchmark 16384.FFI popcount2,1202500,89318
All.Benchmark 1048576.ByteString.foldl,1758438587,90281020
All.Benchmark 1048576.FFI popcount (capi),242734953,10673338
All.Benchmark 1048576.FFI popcount (ccall),245438195,21492910
All.Benchmark 1048576.FFI popcount2,72355975,5351204
All.Benchmark 16777216.ByteString.foldl,28489363200,1482779608
All.Benchmark 16777216.FFI popcount (capi),4207487425,375623194
All.Benchmark 16777216.FFI popcount (ccall),4165947175,403993866
All.Benchmark 16777216.FFI popcount2,1414648293,120358902
bgamari commented 4 months ago

FWIW, I am only able get clang-17 to use Mula-Kurz-Lemire to perform the aggregate popcount, with -O3 -march=x86-64-v3 -mavx2. The throughput of this code generation strategy appears to be worse than popcnt:

Name,Mean (ps),2*Stdev (ps)
All.Benchmark 16.ByteString.foldl,24816,1874
All.Benchmark 16.FFI popcount (capi),71280,5192
All.Benchmark 16.FFI popcount (ccall),70564,6326
All.Benchmark 16.FFI popcount2,69767,6430
All.Benchmark 256.ByteString.foldl,429555,41026
All.Benchmark 256.FFI popcount (capi),175486,10656
All.Benchmark 256.FFI popcount (ccall),176354,11926
All.Benchmark 256.FFI popcount2,78172,5246
All.Benchmark 1024.ByteString.foldl,1706307,164376
All.Benchmark 1024.FFI popcount (capi),508996,41728
All.Benchmark 1024.FFI popcount (ccall),512089,41914
All.Benchmark 1024.FFI popcount2,96883,5804
All.Benchmark 16384.ByteString.foldl,27272354,2627134
All.Benchmark 16384.FFI popcount (capi),7188186,658314
All.Benchmark 16384.FFI popcount (ccall),7236059,660950
All.Benchmark 16384.FFI popcount2,482025,41398
All.Benchmark 1048576.ByteString.foldl,1747127775,170061690
All.Benchmark 1048576.FFI popcount (capi),456720887,43348930
All.Benchmark 1048576.FFI popcount (ccall),457968112,42259056
All.Benchmark 1048576.FFI popcount2,29489553,2730716
All.Benchmark 16777216.ByteString.foldl,28217068000,2699019018
All.Benchmark 16777216.FFI popcount (capi),7580001150,682131766
All.Benchmark 16777216.FFI popcount (ccall),7557506100,744643952
All.Benchmark 16777216.FFI popcount2,993599056,84933770

This is an interesting reference.

Kleidukos commented 4 months ago

Interesting. I get my own results with Clang 18.