`ImmersedPoissonSolver` is slow

Yixiao-Zhang commented 6 months ago

I am trying to run simulations with ImmersedPoissonSolver in this branch as suggested by @simone-silvestri. It turned out that using ImmersedPoissonSolver makes the simulation tens of times slower than the original simulation without using an immersed boundary condition. Besides. using AsymptoticInverse or ILUFactorization preconditioner makes the simulation even slower.

Here are the details of my runs. I did three runs. The domain size is 170 x 1320 x 50. In the first run, I used no preconditioner, and the pressure solver is

pressure_solver = ImmersedPoissonSolver(
    ib_grid,
    solver_method = :HeptadiagonalIterativeSolver,
    reltol = 1e-8,
    verbose = true
)

In the second run, I used the AsymptoticInverse preconditioner:

pressure_solver = ImmersedPoissonSolver(
    ib_grid,
    solver_method = :HeptadiagonalIterativeSolver,
    reltol = 1e-8,
    preconditioner = :AsymptoticInverse,
    verbose = true
)

In the third run, I used the ILUFactorization preconditioner:

pressure_solver = ImmersedPoissonSolver(
    ib_grid,
    solver_method = :HeptadiagonalIterativeSolver,
    reltol = 1e-8,
    preconditioner = :ILUFactorization,
    verbose = true
)

I ran each simulation on a NVIDIA V100 GPU (on MIT Satori) for 4 hours and calculated the number of iterations in unit time. As a result, the speeds of these three simulations are 134, 62, and 29 iterations per hour respectively. The full scripts for three runs and the corresponding output logs are given here:

No preconditioner: run_a1.tar.gz
AsymptoticInverse preconditioner: run_a2.tar.gz
ILUFactorization preconditioner: run_a3.tar.gz

Version info:

Julia: v1.9.3
Oceananigans v0.86.0 dev/Oceananigans this branch
CUDA v5.2.0

simone-silvestri commented 6 months ago

I add a bit of context here:

@Yixiao-Zhang is doing a non-hydrostatic simulation with an immersed boundary and he finds that the code crashes with the PCG solver. That is not necessarily connected with the PCG solver but it might be caused by simulation setup or other issues. Since the pressure solver used is experimental (from this branch), as a way to assess where the crashing comes from, I suggested using another method to see if the crash would also happen, which would validate or not the experimental pressure solver.

@Yixiao-Zhang, optimizing GPU preconditioners is a quite difficult task as demonstrated by the preconditioners slowing down the simulation, and probably not a good use of time of trying to figure out a way to speed up these solvers that we are not sure we want to use. Since this simple attempt to have a simulation that runs (with another correct solver) up to the crashing point does not work, I would suggest to just trying to use a (slightly non-correct) FFT pressure solver to see if the simulation still crashes. If not, then we can assume the crashing occurs due to the PCG solver and try to catch the bug. This will probably be a better way to "optimize" the solver since we know that the PCG preconditioned with FFT is faster than these other methods.

simone-silvestri commented 6 months ago

BTW we can use this issue also to track the speed of the immersed PCG solver if you have some results

glwagner commented 6 months ago

Shouldn't we try the new PCG solver with FFT-based preconditioner? This works pretty well. You can limit max_iterations to 2 or 3 if you want.

Yixiao-Zhang commented 6 months ago

Thank you for your replies!

The FFT-based preconditioner was the first one that I tried. It produced suspicious zonal jets in my simulations. This figure shows the difference between the two solvers. It seems to me that these zonal jets are numerical artifacts.

simone-silvestri commented 6 months ago

I think the FFT-based preconditioner is the simulation on the right, which if I remember correctly crashes after a while. The FFT solver on the right is just a FFT based Poisson solver that does not account for the immersed boundary but runs stably (am I correct?)

The reason to try another method was indeed to check if the crash is caused by the PCG solver or by another setting

glwagner commented 6 months ago

Thank you for your replies!

The FFT-based preconditioner was the first one that I tried. It produced suspicious zonal jets in my simulations. This figure shows the difference between the two solvers. It seems to me that these zonal jets are numerical artifacts.

Can you clarify --- is the simulation on the right with the FFT-based direct solver, or is it with a preconditioned conjugate gradient solver that use an FFT as a preconditioner?

My suggestion is to use a preconditioned conjugate gradient solver, with the FFT-based solver as a preconditioner (not as the direct solver).

As for blow up I think the problem happens for very small time-steps? Perhaps try it without adaptive time-stepping for now and also cap the max_iterations as a small number.

glwagner commented 6 months ago

The FFT solver on the right is just a FFT based Poisson solver that does not account for the immersed boundary but runs stably (am I correct?)

You mean left

Yixiao-Zhang commented 6 months ago

I made a simple script for testing, and it takes 3 minutes to run on my PC (on either CPU or GPU). This is a 2D simulation initialized with a lateral buoyancy gradient. The top is tilted. The figure shows the comparison between the default solver and the HeptadiagonalIterativeSolver. The default FFT solver produces pixelated patterns near the top boundary and deep zonal jets in the ocean interior.

Besides, I tired the HeptadiagonalIterativeSolver with the FFT-based solver as a preconditioner. It did not crash for this script and produced almost the same as the HeptadiagonalIterativeSolver with no preconditioner.

using Printf
using Oceananigans
using Oceananigans.Models.NonhydrostaticModels: ImmersedPoissonSolver

# ---------------------------------------------------------------------- #
# Define Parameters

# Numerical Technic
const arch = CPU()
const time_stepper = :RungeKutta3
const advection = WENO()

# Grid
const Nx = 1
const Ny = 200
const Nz = 50
const Lx = 100.0e3
const Ly = 200.0e3
const Lz = 50.0e3

const Δz = Lz / 2  # elevation difference at the top

# Time Stepping
const Δt = 1800.0

# Physical Parameters
const diffusivity = 1.0e-4
const Pr = 1.0
const f₀ = 1.0e-4
const Δb = 1.0e-6  # buoyancy difference at the top

# Output
const output_interval = 1
const deflatelevel = 4

# ---------------------------------------------------------------------- #
# Define Utils

# Height at Top
@inline function z_top(y::R) where {R<:Real}
    return Lz - (Δz / Ly) * y
end

# Viscosity
const viscosity = Pr * diffusivity

# Initial Fields
@inline function b_initial(x::R, y::R, z::R) where {R<:Real}
    ϵ = 100 * eps(R)
    return (Δb / Ly) * y + randn() * ϵ
end

# ---------------------------------------------------------------------- #
# Define the Simulation

# Grid
ib_grid = begin
    underlying_grid = RectilinearGrid(
        arch,
        size = (Nx, Ny, Nz),
        x = (-Lx / 2, Lx / 2),
        y = (0.0, Ly),
        z = (0.0, Lz),
        topology = (Periodic, Bounded, Bounded),
        halo = (4, 4, 4),
    )

    @inline function is_ib(x::R, y::R, z::R) where {R<:Real}
        return z > z_top(y)
    end

    ImmersedBoundaryGrid(
        underlying_grid,
        GridFittedBoundary(is_ib)
    )
end

# Coriolis
coriolis = FPlane(f₀)

# Buoyancy
buoyancy = BuoyancyTracer()

# Closure
closure = ScalarDiffusivity(ν = viscosity, κ = diffusivity)

# Pressure Solver
pressure_solver = ImmersedPoissonSolver(
    ib_grid,
    solver_method = :HeptadiagonalIterativeSolver,
    reltol = 1e-8,
    verbose = false
)

# Model
model = NonhydrostaticModel(;
    grid = ib_grid,
    timestepper = time_stepper,
    advection = advection,
    tracers = (:b, ),
    coriolis = coriolis,
    buoyancy = buoyancy,
    closure = closure,
    pressure_solver = pressure_solver,
)

# Initial Value
set!(model, b = b_initial)

# Simulation
simulation = Simulation(model; Δt = Δt, stop_iteration = 1000)

# Set Output
output_fields = merge(model.velocities, model.tracers)

simulation.output_writers[:output_3d] = NetCDFOutputWriter(
    model,
    output_fields,
    filename = "output.nc",
    schedule = IterationInterval(output_interval),
    overwrite_existing = true,
    deflatelevel = deflatelevel,
)

# Set Progress Function
function print_simulation_progress(simulation)

    model = simulation.model
    i, t = model.clock.iteration, model.clock.time

    @info Printf.@sprintf("Iteration: %d, Time: %.2f Earth Days", i, t / 86400.0)

    return nothing
end

simulation.callbacks[:progress] = Callback(
    print_simulation_progress,
    IterationInterval(100),
)

# ---------------------------------------------------------------------- #
# Run the Simulation
run!(simulation)

Yixiao-Zhang commented 6 months ago

Thank you for reply, @glwagner!

Can you clarify --- is the simulation on the right with the FFT-based direct solver, or is it with a preconditioned conjugate gradient solver that use an FFT as a preconditioner?

On the left is FFT-based direct solver. On the right is the PCG solver with the FFT-based solver as a preconditioner.

My suggestion is to use a preconditioned conjugate gradient solver, with the FFT-based solver as a preconditioner (not as the direct solver).

It is what the right panel shows, if I am not mistaken. But the simulation crashed after thousands of iterations. I heard that the PCG solver in Oceananigans has not been widely tested, so that is why I turned to the HeptadiagonalIterativeSolver.

As for blow up I think the problem happens for very small time-steps? Perhaps try it without adaptive time-stepping for now and also cap the max_iterations as a small number.

I am doing more testing on this. It is a different issue though. I will open a new issue if I can find a simple way to reproduce the blow up.

glwagner commented 6 months ago

It is what the right panel shows, if I am not mistaken. But the simulation crashed after thousands of iterations. I heard that the PCG solver in Oceananigans has not been widely tested, so that is why I turned to the HeptadiagonalIterativeSolver.

Both of those solvers actually use the preconditioned conjugate gradient method. It's also not true --- the PreconditionedConjugateGradientSolver has been validated. I'm not even sure it's possible to use the FFT-based preconditioner with the heptadiagonal solver, they have different interfaces. Maybe you worked on that.

It's not obvious how to generalize the HeptadiagonalIterativeSolver to support Distributed architecture, and its also likely more difficult to optimize for immersed boundary methods using an active cells map.

We shouldn't waste our time with the HeptadiagonalIterativeSolver. If the PreconditionedConjugateGradientSolver has issues, we should fix them. It's a waste of energy to work on both.

CliMA / Oceananigans.jl

`ImmersedPoissonSolver` is slow #3552