All using GHC 9.6.5 on Linux with my branch (in particular bgamari/popcount-benchmark@46f47df6d40eb4121784dfc4a8f2585c7567d5b4), which has a few important fixes and a more efficient, word-wise implementation (popcount2). All C compilation performed with nixpkgs' default gcc configuration. Invoked via:
nix run github:bgamari/popcount-benchmark# -- --csv out; cat out
Note that this configuration does not enable use of the native x86-64 popcount instructions provided by SSE4.2; see #2 for such results.
Overall, the capi overhead is happily quite small.
Any codegen benefits that the naive C implementation might have aren't enough to compensate for the FFI call overhead
The word-granular implementation is considerably faster for non-trivial input sizes but still fairly slow for such a simple operation (e.g. 3GB/s on 1MB input on the Devil's Canyon).
All using GHC 9.6.5 on Linux with my branch (in particular bgamari/popcount-benchmark@46f47df6d40eb4121784dfc4a8f2585c7567d5b4), which has a few important fixes and a more efficient, word-wise implementation (
popcount2
). All C compilation performed with nixpkgs' defaultgcc
configuration. Invoked via:Note that this configuration does not enable use of the native x86-64 popcount instructions provided by SSE4.2; see #2 for such results.
On an older Devil's Canyon (i7-4790K)
On a newish Ryzen 5900X
On an older Sandy Bridge EP (Xeon E5-2690)
On a rather noisy Ryzen 7 7840U