CliMA / Oceananigans.jl

🌊 Julia software for fast, friendly, flexible, ocean-flavored fluid dynamics on CPUs and GPUs
https://clima.github.io/OceananigansDocumentation/stable
MIT License
986 stars 193 forks source link

Performance of Preconditioned Conjugate Gradient solver #2728

Open navidcy opened 2 years ago

navidcy commented 2 years ago

Working with @elise-palethorpe we see that the preconditioned conjugate gradient solver is much slower than expected. See #2654

In particular, benchmarks on solving the Poisson equation on a doubly bounded domain on the ep/pcg-with-multigrid branch give:

julia> include("validation/elliptic_solvers/doubly_bounded_poisson.jl")

julia> include("doubly_bounded_poisson.jl")
[ Info: Solving the Poisson equation with an FFT-based solver...
  123.083 μs (93 allocations: 17.56 KiB)
[ Info: Solving the Poisson equation with a conjugate gradient iterative solver...
  64.748 ms (80482 allocations: 25.93 MiB)
[ Info: Solving the Poisson equation with the Algebraic Multigrid solver...
  9.491 ms (498 allocations: 8.46 MiB)
[ Info: Solving the Poisson equation with a conjugate gradient preconditioned iterative solver w/ AMG as preconditioner...
  47.891 ms (12771 allocations: 111.97 MiB)

We'd expect the PCG to perform similarly to MG and MG-preconditioned PGC to perform better. There is definitely some issue with memory allocations but, possibly, something else?

simone-silvestri commented 2 years ago

I am fairly convinced that it is because of the fill_halo which are absent in matrix based solvers

navidcy commented 2 years ago

The memory allocations are 40% due to fill halo and rest due to dot(). But that's memory allocations. Don't know how they affect speed.

simone-silvestri commented 2 years ago

I ll try to produce a profile of validation/elliptic_solvers/doubly_bounded_poisson.jl with nsys and post it here

simone-silvestri commented 2 years ago

I Just noticed that this is for the CPU, then profiling might be a little more difficult

glwagner commented 2 years ago

Should we also expect the MG preconditioner to perform similarly to the FFT-based preconditioner? Those are relatively similar algorithms. How many CG iterations are we performing for either?

navidcy commented 2 years ago

Is this allocating?

https://github.com/CliMA/Oceananigans.jl/blob/085969cf25456d0ff456158ddaaf4e6f49c141da/src/Solvers/preconditioned_conjugate_gradient_solver.jl#L197

navidcy commented 2 years ago

Is this allocating?

https://github.com/CliMA/Oceananigans.jl/blob/085969cf25456d0ff456158ddaaf4e6f49c141da/src/Solvers/preconditioned_conjugate_gradient_solver.jl#L197

No they don’t. had a little look and it’s the fill halos that bring allocations……