FourierFlows / FourierFlows.jl

Tools for building fast, hackable, pseudospectral partial differential equation solvers on periodic domains
https://bit.ly/FourierFlows
MIT License
206 stars 29 forks source link

Benchmarks are needed #213

Open navidcy opened 4 years ago

navidcy commented 4 years ago

Sometimes benchmarks will catch a bug that does not result in test failing but does result in considerable slowdown.

For example, playing around I figured that ETDRK4 time stepper is often faster than RK4.

julia> using FourierFlows, BenchmarkTools
[ Info: FourierFlows will use 12 threads

julia> prob_ForwardEuler = FourierFlows.Diffusion.Problem(stepper="ForwardEuler");

julia> prob_AB3 = FourierFlows.Diffusion.Problem(stepper="AB3");

julia> prob_RK4 = FourierFlows.Diffusion.Problem(stepper="RK4");

julia> prob_ETDRK4 = FourierFlows.Diffusion.Problem(stepper="ETDRK4");

julia> @btime stepforward!(prob_ForwardEuler, 1)
  243.669 ns (5 allocations: 304 bytes)

julia> @btime stepforward!(prob_AB3, 1)
  460.903 ns (5 allocations: 304 bytes)

julia> @btime stepforward!(prob_RK4, 1)
  1.362 μs (17 allocations: 1.05 KiB)

julia> @btime stepforward!(prob_ETDRK4, 1)
  1.306 μs (17 allocations: 1.05 KiB)

julia> using GeophysicalFlows

julia> prob_ForwardEuler = GeophysicalFlows.TwoDNavierStokes.Problem(stepper="ForwardEuler");

julia> prob_AB3 = GeophysicalFlows.TwoDNavierStokes.Problem(stepper="AB3");

julia> prob_RK4 = GeophysicalFlows.TwoDNavierStokes.Problem(stepper="RK4");

julia> prob_ETDRK4 = GeophysicalFlows.TwoDNavierStokes.Problem(stepper="ETDRK4");

julia> @btime stepforward!(prob_ForwardEuler, 1)
  1.455 ms (989 allocations: 95.92 KiB)

julia> @btime stepforward!(prob_AB3, 1)
  1.420 ms (990 allocations: 95.95 KiB)

julia> @btime stepforward!(prob_RK4, 1)
  6.539 ms (3957 allocations: 383.67 KiB)

julia> @btime stepforward!(prob_ETDRK4, 1)
  4.606 ms (3957 allocations: 383.67 KiB)

I'm not sure if this is a bug or if this is indeed how it's supposed to be. But if it's the latter, then this would argue that you should always prefer ETDRK4 over RK4 when your timestep is fixed.

navidcy commented 4 years ago

I'm pretty sure that RK4 should be faster. Both RK4 and ETDRK4 involve 4 calls of calcN!...

glwagner commented 4 years ago

Benchmarks are definitely a good idea --- this script is probably good enough. It may be better to use the native Diffusion model for benchmarks of the timestepping methods?

The difference between RK4 and ETDRK4 is that linear terms are explicitly calculated in RK4. This may involve a few extra arithmetic operations that account for the 5% difference in timing? 5% may be close to the accuracy of the benchmark, by the way, so its hard to tell if this is a real difference. I don't think many users would notice this difference. I'm happy to see that they are within 5% and that memory consumption is low. This is a good result in my opinion.

A trickier question is whether multithreading / manually written kernels might speed up these time-stepping routines, and whether we should implement it via KernelAbstractions. This benchmark is a good start.