JuliaPerf / BenchmarksGame.jl

Other
43 stars 13 forks source link

make fannkuchredux-fast slighly faster #16

Closed hycakir closed 5 years ago

hycakir commented 5 years ago

This should (hopefully) run slightly faster than the latest Julia and Java implementation (needs a test though). This makes count_flips (the bottleneck) slightly faster. The parallelization is now done the same way as in Jeremy Zerfas' C implementation, changing block sizes, removing atomic operations and using reduction after threads join.

KristofferC commented 5 years ago

For me:

Before:

   fannkuchredux-fast.jl        1    8.62s  22.1%

After:

  fannkuchredux-fast.jl        1    7.91s  21.3%
hycakir commented 5 years ago
  fannkuchredux-fast.jl        1    7.91s  21.3%

If you are running with 8 threads or more, changing block size to 16, 24 or 32 should also help. The machine they use with benchmarking uses 4 threads (telling from the results). That's why I set it to 12.

KristofferC commented 5 years ago

The script sets 4 threads: https://github.com/KristofferC/BenchmarksGame.jl/blob/4d91325c86e0c8b04dbb7dc289080c0c568a0b4a/run_benchmarks.jl#L5