BoostGSoC13 / odeint-v2

odeint - parallelization GSoC project
http://headmyshoulder.github.com/odeint-v2/
0 stars 0 forks source link

OpenMP benchmark #3

Open neapel opened 11 years ago

neapel commented 11 years ago

Measure speedup

mariomulansky commented 11 years ago

I suggest to use nonlinear oscillator lattices, I have some tuned codes for that already we can check against.

neapel commented 11 years ago

1725e45ac5a41b14ae0e has a trivial benchmark to check if there's any kind of speedup but the values vary wildly with gcc's OpenMP library, with Intel's it's more stable...

mariomulansky commented 11 years ago

Maybe 1024 lorenz systems is still not enough to benefit from parallelization? For a proper benchmark I suggest to look at the scaling with cores. For such a loranz example, which is completely uncoupled, it should scale almost perfectly with the number of cores (or memory bandwidth for that matter)

neapel commented 11 years ago

Benchmark results for GCC 4.7.3 and ICC 13.1.1 on i7-3770 with (short) n=4096, 1024steps; (long) n=4194304, 1step; using (split) openmp_algebra=openmp_nested_algebra<range_algebra> with openmp_state or (simple) openmp_range_algebra with vector. All cases with schedule(runtime) and OMP_SCHEDULE=static.

osc_chain_speedup

neapel commented 11 years ago

Times for split/simple not comparable because simple case doesn't store values between cycles of the loop see here; all with release build; speedup with debug builds tend to be much larger but the times longer.