FourierFlows / FourierFlows.jl

Tools for building fast, hackable, pseudospectral partial differential equation solvers on periodic domains
https://bit.ly/FourierFlows
MIT License
207 stars 29 forks source link

[New Feature] Support for low-storage RK4 method #334

Closed doraemonho closed 1 year ago

doraemonho commented 1 year ago

Hi,

I modify a bit of FourierFlows for adding a new time-stepper to support low-storage RK4 scheme. (The code is in below) This is a classical 5 stages 2 registers process to trade off 25~30% performance but saving 50% of storge space for RK4 method. [1]. I wonder could we make FourierFlows support this feature in the next update.

[1] M.H. Carpenter, C.A. Kennedy, Fourth-order 2N-storage Runge–Kutta schemes, Technical Report NASA TM-109112, NASA Langley Research Center, VA, June 1994.

# ------
# LSRK(5)4
# ------
"""
    struct LSRK54TimeStepper{T} <: AbstractTimeStepper{T}

A 5 stage 2 storage 4th-order Runge-Kutta timestepper for time-stepping. `∂u/∂t = F(u, t)` via:

Define S² = 0 at the first stage.
for i = 1:5
  S²   = Aᵢ*S² + dt*Fⁱ(t₀+ Cᵢdt, uⁿ);
  uⁿ⁺¹ = uⁿ + Bᵢ*S²
end
where
# dt -> time interal 
# Aᵢ  -> A coefficients from LSRK tableau table in stage i 
# Bᵢ  -> B coefficients from LSRK tableau table in stage i 
# Cᵢ  -> C coefficients from LSRK tableau table in stage i 
#  i  -> stage i 

for detials, please refer to M.H. Carpenter, C.A. Kennedy, Fourth-order 2N-storage Runge–Kutta schemes, Technical Report NASA TM-109112, NASA Langley Research Center, VA, June 1994
"""
struct LSRK54TimeStepper{T,Vec} <: AbstractTimeStepper{T}
    S² :: T
    Fⁱ :: T
    A  :: Vec
    B  :: Vec
    C  :: Vec
end

"""
    LSRK54TimeStepper(equation::Equation, dev::Device=CPU())

Construct a 4th-order 5 stages low storage Runge-Kutta timestepper for `equation` on device `dev`.
"""
function LSRK54TimeStepper(equation::Equation, dev::Device=CPU())
  @devzeros typeof(dev) equation.T equation.dims S² Fⁱ

  T = equation.T;
  A = T[0.0, -0.417890474499852, -1.19215169464268, -1.69778469247153, -1.51418344425716];
  B = T[0.149659021999229, 0.379210312999627, 0.822955029386982, 0.699450455949122, 0.153057247968152];
  C = T[ 0.0, 0.149659021999229, 0.3704009573642045, 0.6222557631344415, 0.95828213067469];

  return LSRK54TimeStepper(S², Fⁱ, A, B, C);
end

function LSRK54!(sol, clock, ts, equation, vars, params, grid, t, dt)

    # Get the A,B,C ceof. for LSRK54 method
    T = equation.T;
    A = ts.A::Array{T,1};
    B = ts.B::Array{T,1};
    C = ts.C::Array{T,1};

    # Init. the S² term
    @. ts.S² *= 0;

    for i = 1:5
        equation.calcN!(ts.Fⁱ, sol, t + C[i]*dt , clock, vars, params, grid)
        addlinearterm!(ts.Fⁱ, equation.L, sol);
        @. ts.S² = A[i]*ts.S² + dt*ts.Fⁱ;
        @.   sol =   sol + B[i]*ts.S²;
    end

    return nothing;
end

function  stepforward!(sol, clock, ts::LSRK54TimeStepper, equation, vars, params, grid)
    LSRK54!(sol, clock, ts, equation, vars, params, grid, clock.t, clock.dt)
    #Finishing the time step
    clock.t    += clock.dt
    clock.step += 1;
    return nothing;
end
navidcy commented 1 year ago

Feel free to open a PR.

Do you have/find cases where memory is the bottleneck compared to efficiency?

doraemonho commented 1 year ago

Feel free to open a PR.

Do you have/find cases where memory is the bottleneck compared to efficiency?

Definitely 3D simulation on GPU. With a high-end gaming/data center GPU, the speed is fine but we already approaching to the resolution limit of using a single GPU. Reducing 50% memory can further make the Cubes 1.25 times larger in length.

navidcy commented 1 year ago

Cool! Go for it. Open a PR and I can help if needed.

doraemonho commented 1 year ago

Cool! Go for it. Open a PR and I can help if needed.

Thanks!I submitted the PR. I will need your help to review it and make the merge if the code is fine.