This benchmark suite compares the code generation of Clang and GCC, with a native Haskell as a baseline. The point is not to compare FFI & native Haskell, but rather to see how Clang does loop unrolling & AVX2 comapared to GCC.
Please read the following files:
Word8
is a combination of popCnt8#
and ByteString's foldl'.size_t popcount (char const * src, size_t len) {
size_t result = 0;
for (size_t i = 0; i < len; i++) {
result += __builtin_popcount(src[i]);
}
return result;
}
With a clang-enabled GHC 9.6:
1Mo
ByteString.foldl: OK
1.32 ms ± 121 μs
FFI popcount: OK
50.0 ns ± 1.8 ns
10Mo
ByteString.foldl: OK
13.2 ms ± 723 μs
FFI popcount: OK
61.2 ns ± 5.9 ns
100Mo
ByteString.foldl: OK
131 ms ± 3.2 ms
FFI popcount: OK
38.8 ns ± 2.7 ns
With a GCC-enabled GHC 9.4.6:
1Mo
ByteString.foldl: OK
1.38 ms ± 106 μs
FFI popcount: OK
63.6 ns ± 1.5 ns
10Mo
ByteString.foldl: OK
13.7 ms ± 1.4 ms
FFI popcount: OK
88.5 ns ± 5.3 ns
100Mo
ByteString.foldl: OK
136 ms ± 2.8 ms
FFI popcount: OK
39.5 ns ± 3.3 ns
Use ghcup to install multiple toolchains:
Install a gcc-enabled GHC (Find the URL for your system here)
ghcup install ghc -u 'https://downloads.haskell.org/~ghc/9.6.4/ghc-9.6.4-x86_64-fedora33-linux.tar.xz' 9.6.4-gcc
Then run:
$ cabal bench -w ghc-9.6.4
$ cabal clean # important!
$ cabal bench -w ghc-9.6.4-gcc