Parallel issues using a custom multigrid method with static condensation and hybridization

thomasgibson commented 5 years ago

Dear all,

I am truly mystified by this apparent bug I am observing using the TSFC branch interpolation-operator and the Firedrake branch gtmg-rebased. I will do my best to summarize the problem here.

I have been working with @eikehmueller and Jack Betteridge on a custom multigrid method using the procedure outlined by Gopolokrishnan and Tan (https://onlinelibrary.wiley.com/doi/abs/10.1002/nla.636) and Cockburn et al (http://web.pdx.edu/~gjay/pub/mgHDG.pdf).

The TLDR version: For hybridizable systems, you can condense problems to a reduced system for the unknowns on a trace space (functions living only on cell facets). This reduced system is spectrally equivalent to an operator appearing in an H1 discretization of a primal problem (Poisson operator for mixed Poisson, SPD Helmholtz for mixed Helmholtz, etc.)

The multigrid algorithm

After arriving at a statically condensed problem for the trace variables (this is achieved by using the Firedrake preconditioners firedrake.HybridizationPC or firedrake.SCPC, which uses Slate), we "restrict" the trace system to a P1 discretization of the primal operator.
The P1 problem can then be solved directly (making this just a 2 level method) or using another multigrid method on the P1 problem.
The P1 solution is prolonged to the trace space and the solver algorithm for the trace system continues.
Pre-/post-smoothing is applied as usual in a standard V-cycle MG method.

Implementation I wrote a python preconditioner for this multigrid method on the branch gtmg-rebased, called firedrake.GTMGPC. The intention is that this can be used via the standard way of composing solver options.

GTMGPC creates a PCMG with 2 levels. The "fine problem" is just the trace system and the "coarse problem" is the P1 discretization (provided through the application context). The meshes are the same for the fine and coarse problems; this makes the algorithm a non-nested function space method. You can see the details in firedrake/preconditioners/gtmg.py.

Here is a functional mixed Poisson example using this new PC:

from firedrake import *

N = 5
mesh = UnitSquareMesh(N**2, N**2)
x = SpatialCoordinate(mesh)

def get_p1_space():
    return FunctionSpace(mesh, "CG", 1)

def get_p1_prb_bcs():
    return DirichletBC(get_p1_space(), Constant(0.0), "on_boundary")

def p1_callback():
    P1 = get_p1_space()
    p = TrialFunction(P1)
    q = TestFunction(P1)
    return inner(grad(p), grad(q))*dx

degree = 1
RT = FunctionSpace(mesh, "RT", degree)
DG = FunctionSpace(mesh, "DG", degree - 1)
W = RT * DG
sigma, u = TrialFunctions(W)
tau, v = TestFunctions(W)
n = FacetNormal(mesh)
f = Function(DG)
f.interpolate(2*pi**2 * sin(pi*x[0]) * sin(pi*x[1]))
a = (dot(sigma, tau) - div(tau)*u + v*div(sigma)) * dx
L = f * v * dx
w = Function(W)

params = {'mat_type': 'matfree',
          'ksp_view': None,
          'ksp_type': 'preonly',
          'pc_type': 'python',
          'pc_python_type': 'firedrake.HybridizationPC',
          'hybridization': {'ksp_type': 'cg',
                            'mat_type': 'matfree',
                            'ksp_rtol': 1e-8,
                            'ksp_monitor_true_residual': None,
                            'pc_type': 'python',
                            'pc_python_type': 'firedrake.GTMGPC',
                            'gt': {'mg_levels': {'ksp_type': 'richardson',
                                                 'pc_type': 'bjacobi',
                                                 'sub_pc_type': 'ilu',
                                                 'ksp_max_it': 3},
                                   'mg_coarse': {'ksp_type': 'cg',
                                                 'ksp_monitor_true_residual': None,
                                                 'ksp_rtol': 1e-8,
                                                 'pc_type': 'gamg',
                                                 'mg_levels': {'ksp_type': 'chebyshev',
                                                               'pc_type': 'bjacobi',
                                                               'sub_pc_type': 'ilu',
                                                               'ksp_max_it': 4}}}}}
appctx = {'get_coarse_operator': p1_callback,
          'get_coarse_space': get_p1_space,
          'coarse_space_bcs': get_p1_prb_bcs()}
solve(a == L, w, solver_parameters=params, appctx=appctx)

This composes firedrake.HybridizationPC with firedrake.GTMGPC and works great in serial and parallel. But there is an issue.

The parallel problem

The following is another mixed Poisson example. This time, however, rather than using Firedrake's automagical hybridization PC, I manually write out the hybridizable system and opt to use firedrake.SCPC:

from firedrake import *

mesh = UnitSquareMesh(10, 10)

def get_p1_space():
    return FunctionSpace(mesh, "CG", 1)

def get_p1_prb_bcs():
    return DirichletBC(get_p1_space(), Constant(0.0), "on_boundary")

def p1_callback():
    P1 = get_p1_space()
    p = TrialFunction(P1)
    q = TestFunction(P1)
    return inner(grad(p), grad(q))*dx

x = SpatialCoordinate(mesh)
degree = 1
n = FacetNormal(mesh)
U = FunctionSpace(mesh, "DRT", degree)
V = FunctionSpace(mesh, "DG", degree - 1)
T = FunctionSpace(mesh, "DGT", degree - 1)
W = U * V * T
u, p, lambdar = TrialFunctions(W)
w, q, gammar = TestFunctions(W)
f = Function(V)
f.interpolate(-2*(x[0]-1)*x[0] - 2*(x[1]-1)*x[1])
a = (dot(u, w)*dx - div(w)*p*dx
     + lambdar('+')*jump(w, n=n)*dS
     + div(u)*q*dx
     + gammar('+')*jump(u, n=n)*dS)
L = q*f*dx
s = Function(W)

params = {'ksp_type': 'preonly',
          'ksp_max_it': 10,
          'ksp_view': None,
          # 'ksp_monitor_true_residual': None,
          'mat_type': 'matfree',
          'pmat_type': 'matfree',
          'pc_type': 'python',
          'pc_python_type': 'firedrake.SCPC',
          'pc_sc_eliminate_fields': '0, 1',
          'condensed_field': {'ksp_type': 'cg',
                              'mat_type': 'matfree',
                              'ksp_rtol': 1e-8,
                              'ksp_monitor_true_residual': None,
                              'pc_type': 'python',
                              'pc_python_type': 'firedrake.GTMGPC',
                              'gt': {'mat_type': 'aij',
                                     'mg_levels': {'ksp_type': 'richardson',
                                                   'pc_type': 'bjacobi',
                                                   'sub_pc_type': 'ilu',
                                                   'ksp_max_it': 3},
                                     'mg_coarse': {'ksp_type': 'preonly',
                                                   'pc_type': 'lu',
                                                   'pc_factor_mat_solver_type': 'mumps'}}}}
appctx = {'get_coarse_operator': p1_callback,
          'get_coarse_space': get_p1_space,
          'coarse_space_bcs': get_p1_prb_bcs()}
bcs = DirichletBC(W.sub(2), Constant(0.0), "on_boundary")
problem = LinearVariationalProblem(a, L, s, bcs=bcs)
solver = LinearVariationalSolver(problem, solver_parameters=params, appctx=appctx)
solver.solve()

This problem works in serial but does not appear to converge in parallel (based on the output of ksp_monitor_true_residual. I'm just solving the P1 problem directly.

Here are some observations I made with local debugging:

firedrake.HybridizationPC + firedrake.GTMGPC works in serial and parallel.
firedrake.SCPC without using firedrake.GTMGPC (say just direct AMG on the trace problem or LU) works in serial and parallel (not new).
firedrake.SCPC + firedrake.GTMGPC appears to work in serial by doesn't converge in parallel.

I have tried checking how communicators are passed, checking the restrict/prolongation operators, making sure it's actually composing via ksp view. I am really lost here and I'm wondering if anyone else can reproduce this and perhaps knows where the issue might be. If anyone has any input, I would greatly appreciate it.

thomasgibson commented 5 years ago

To demonstrate the problem, here is the KSP monitor output for the second example (Mixed Poisson using SCPC with GTMG) in serial:

      Residual norms for firedrake_0_condensed_field_ solve.
      0 KSP preconditioned resid norm 2.608527993563e-01 true resid norm 3.988191692570e-02 ||r(i)||/||b|| 1.000000000000e+00
      1 KSP preconditioned resid norm 3.830328509318e-02 true resid norm 3.816913134420e-02 ||r(i)||/||b|| 9.570535793282e-01
      2 KSP preconditioned resid norm 4.343369606752e-04 true resid norm 3.467268641914e-04 ||r(i)||/||b|| 8.693836478257e-03
      3 KSP preconditioned resid norm 1.426126860648e-05 true resid norm 5.421700616717e-06 ||r(i)||/||b|| 1.359438320585e-04
      4 KSP preconditioned resid norm 4.066643734775e-07 true resid norm 2.520100603594e-07 ||r(i)||/||b|| 6.318905403392e-06
      5 KSP preconditioned resid norm 1.539995204885e-08 true resid norm 5.022509180942e-09 ||r(i)||/||b|| 1.259344978402e-07
      6 KSP preconditioned resid norm 1.475626028393e-10 true resid norm 8.241467982731e-11 ||r(i)||/||b|| 2.066467366172e-09

Now here is the same example using just 4 mpi processes:

      Residual norms for firedrake_0_condensed_field_ solve.
      0 KSP preconditioned resid norm 2.550759937340e-01 true resid norm 3.988191692570e-02 ||r(i)||/||b|| 1.000000000000e+00
      1 KSP preconditioned resid norm 1.053102815308e-01 true resid norm 9.579242651267e-02 ||r(i)||/||b|| 2.401901259940e+00

I am not sure how to reveal the cause of this.

wence- commented 5 years ago

Possible cause, the restriction/prolongation are not correct in parallel (not forcing halo exchanges perhaps?)

wence- commented 5 years ago

Notice how the initial preconditioned residual is already wrong.

thomasgibson commented 5 years ago

Don't know if this reveals anything, but I just tried the following options for the condensed KSP:

          'condensed_field': {'ksp_type': 'fcg',
                              'mat_type': 'matfree',
                              'ksp_rtol': 1e-8,
                              'ksp_monitor_true_residual': None,
                              'pc_type': 'python',
                              'pc_python_type': 'firedrake.GTMGPC',
                              'gt': {'mg_levels': {'ksp_type': 'gmres',
                                                   'pc_type': 'bjacobi',
                                                   'sub_pc_type': 'ilu',
                                                   'ksp_max_it': 5},
                                     'mg_coarse': {'ksp_type': 'preonly',
                                                   'pc_type': 'lu',
                                                   'pc_factor_mat_solver_type': 'mumps'}}}}

And this works in serial and in parallel. But maybe this is masking the problem since I'm using GMRES to clean up the mess when restricting from Trace to P1?

Possible cause, the restriction/prolongation are not correct in parallel (not forcing halo exchanges perhaps?)

Is there a way that I can check/fix this? I'm using the interpolation matrix provided by the Interpolator object (this is using the stuff in #1453).

wence- commented 5 years ago

To check the interpolation, what are the invariants you expect? Check those.

One question, you're using unscaled richardson as a smoother, is that safe? Are you getting lucky in serial.

thomasgibson commented 5 years ago

Okay, thanks!

One question, you're using unscaled richardson as a smoother, is that safe? Are you getting lucky in serial.

Probably, now that I think about it. Though I don't know of an intuitive way to compute the scaling factor without just heuristically determining it.

colinjcotter commented 5 years ago

Try running a parallel version of some of the interpolator tests.

From: Thomas H. Gibson notifications@github.com Sent: 29 August 2019 14:17:42 To: firedrakeproject/firedrake firedrake@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [firedrakeproject/firedrake] Parallel issues using a custom multigrid method with static condensation and hybridization (#1492)

Don't know if this reveals anything, but I just tried the following options for the condensed KSP:

      'condensed_field': {'ksp_type': 'fcg',
                          'mat_type': 'matfree',
                          'ksp_rtol': 1e-8,
                          'ksp_monitor_true_residual': None,
                          'pc_type': 'python',
                          'pc_python_type': 'firedrake.GTMGPC',
                          'gt': {'mg_levels': {'ksp_type': 'gmres',
                                               'pc_type': 'bjacobi',
                                               'sub_pc_type': 'ilu',
                                               'ksp_max_it': 5},
                                 'mg_coarse': {'ksp_type': 'preonly',
                                               'pc_type': 'lu',
                                               'pc_factor_mat_solver_type': 'mumps'}}}}

And this works in serial and in parallel. But maybe this is masking the problem since I'm using GMRES to clean up the mess when restricting from Trace to P1?

Possible cause, the restriction/prolongation are not correct in parallel (not forcing halo exchanges perhaps?)

Is there a way that I can check/fix this? I'm using the interpolation matrix provided by the Interpolator object (this is using the stuff in #1453https://github.com/firedrakeproject/firedrake/pull/1453).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/firedrakeproject/firedrake/issues/1492?email_source=notifications&email_token=ABOSV4VPOBTUJCYYAZSGLALQG7D7NA5CNFSM4IRLXNZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5OOCZQ#issuecomment-526180710, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABOSV4XAINKIKB2GTOJFYO3QG7D7NANCNFSM4IRLXNZA.

thomasgibson commented 5 years ago

I ran the tests using the new Interpolate functionality and they all pass in parallel. But I want reiterate the strange thing I'm observing.

firedrake.HybridizationPC + firedrake.GTMGPC works in serial and parallel (see first example above). But when I manually write the hybridizable problem and use firedrake.SCPC with firedrake.GTMGPC, it has problems in parallel. SCPC has been tested in serial and parallel and has been functioning just fine for a long while. But something with the composition of the two python PCs is messing things up.

colinjcotter commented 5 years ago

So question 1 is: is the solution to the manual hybridizable problem equivalent in practice to the solution from the hybridizable mixed problem?

Then, is the system HybridizationPC builds equal to that system?

Is the same set of options and same appctx reaching the solver for that system as your manu-hybridizable system?

cheers

--cjc

From: Thomas H. Gibson notifications@github.com Sent: 29 August 2019 16:14:54 To: firedrakeproject/firedrake firedrake@noreply.github.com Cc: Cotter, Colin J colin.cotter@imperial.ac.uk; Comment comment@noreply.github.com Subject: Re: [firedrakeproject/firedrake] Parallel issues using a custom multigrid method with static condensation and hybridization (#1492)

I ran the tests using the new Interpolate functionality and they all pass in parallel. But I want reiterate the strange thing I'm observing.

firedrake.HybridizationPC + firedrake.GTMGPC works in serial and parallel (see first example above). But when I manually write the hybridizable problem and use firedrake.SCPC with firedrake.GTMGPC, it has problems in parallel. SCPC has been tested in serial and parallel and has been functioning just fine for a long while. But something with the composition of the two python PCs is messing things up.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/firedrakeproject/firedrake/issues/1492?email_source=notifications&email_token=ABOSV4UM23SFVHAXUVQDEUDQG7RW5A5CNFSM4IRLXNZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5O2TJQ#issuecomment-526231974, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABOSV4W7R5DM5I7FKVFLBULQG7RW5ANCNFSM4IRLXNZA.

thomasgibson commented 5 years ago

So question 1 is: is the solution to the manual hybridizable problem equivalent in practice to the solution from the hybridizable mixed problem?

Yes, they are mechanically doing the same thing.

Then, is the system HybridizationPC builds equal to that system?

Mathematically, yes. Though SCPC does a complete factorization. Some of the blocks in the hybridizable mixed method are always 0, but SCPC isn't implemented to assume that. It also doesn't assume any off diagonal block matrices are transposes of each other. IOW: SCPC always builds a condensed operator of the form: S = D - C * A.inv * B. In the hybridizable mixed method, D = 0, and C = B.T, so HybridizationPC performs those simplifications when it builds the reduced system.

Is the same set of options and same appctx reaching the solver for that system as your manu-hybridizable system?

Yes it does, because GTMGPC is designed to raise exceptions when it doesn't get the necessary information from the appctx. I also just checked that it does.

Here is a question for @wence-: In the implementation of GTMGPC, I do the following:

        pcmg = PETSc.PC().create(comm=pc.comm)
        pcmg.incrementTabLevel(1, parent=pc)
        pcmg.setType(pc.Type.MG)
        pcmg.setOptionsPrefix(options_prefix)
        pcmg.setMGLevels(2)
        pcmg.setMGCycleType(pc.MGCycleType.V)
        pcmg.setMGInterpolation(1, interp_petscmat)

where interp_petscmat is the interpolation matrix obtained via:

            fine_space = context.a.arguments()[0].function_space()
            interpolator = Interpolator(TestFunction(coarse_space), fine_space)
            interpolation_matrix = interpolator.callable()
            interpolation_matrix._force_evaluation()
            interp_petscmat = interpolation_matrix.handle

fine_space is the trace space, coarse_space is the P1 space.

Is PETSc's PCMG able to get the appropriate restriction/prolongation operators from this? Or should I also be providing a restriction matrix?

eikehmueller commented 5 years ago

What are the blocks that are inverted in the Block-Jacobi smoother, if we use 'bjacobi' together with firedrake.GTMGPC? I always thought that a block consists of the dofs in one facet, but maybe this is not true? The reason I think this is because if I set 'mg_levels':{''pc_type': 'bjacobi','sub_pc_type':'lu',...}, then the solver in the MWE above converges to machine precision in one iteration on one processor, but requires several iterations (and still converges) on two processors. So maybe if we use ILU instead (as in the original code above), it will do an ILU for the full system on each of the processors, resulting in a non-SPD preconditioner, which crashes with CG (but works actually ok-ish with GMRES, as I checked). firedrake.HybridizationPC might use the correct blocking, though.

JDBetteridge commented 5 years ago

I think the blocks in 'bjacobi' are just of size $1/P$ where $P$ is the number of processes being run on (unless otherwise specified).

https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCBJACOBI.html

colinjcotter commented 5 years ago

That's right.

From: JDBetteridge notifications@github.com Sent: 29 August 2019 17:41:10 To: firedrakeproject/firedrake firedrake@noreply.github.com Cc: Cotter, Colin J colin.cotter@imperial.ac.uk; Comment comment@noreply.github.com Subject: Re: [firedrakeproject/firedrake] Parallel issues using a custom multigrid method with static condensation and hybridization (#1492)

I think the blocks in 'bjacobi' are just of size $1/P$ where $P$ is the number of processes being run on (unless otherwise specified).

https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCBJACOBI.html

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/firedrakeproject/firedrake/issues/1492?email_source=notifications&email_token=ABOSV4VMCLU7DHW4CCI6STDQG732NA5CNFSM4IRLXNZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5PDGKY#issuecomment-526267179, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABOSV4XSGNWH332S5MXGEFDQG732NANCNFSM4IRLXNZA.

eikehmueller commented 5 years ago

If I use

'mg_levels': {'ksp_type': 'richardson',
              'pc_type': 'jacobi',
              'ksp_convergence_test': 'skip',
              'ksp_max_it': 2}

The convergence rates between the sequential and parallel runs are absolutely identical (although both don't converge). This agrees with what I expect, since Jacobi with a fixed number gives the same results, no matter how many processors I use. But with what Colin and Jack say above, I'm not very surprised that the results with Block-Jacobi depend on the number of processors, and it only works sequentially.

Still, the question is why the behaviour is different with firedrake.HybridizationPC, where it works both in parallel and sequentially. I'll look at that next.

eikehmueller commented 5 years ago

Using the Jacobi smoother as above with firedrake.HybridizationPC, both sequential and parallel code converge, with very similar (although not exactly identical) convergence rates.

eikehmueller commented 5 years ago

None of this really helps to shed light on the problem, though...

thomasgibson commented 5 years ago

OK. So I ran an experiment and found the source of the problem. Basically I set all solvers to be preonly. I took a single application of the multigrid method (using LU on the P1 problem) and checked the output of the scalar field (pre-smooth, coarse direct solve, post-smooth). I noticed that the output for SCPC is flipped in sign (HybridizationPC was fine). The infamous sign error.

Now, consider the hybridizable formulation of a saddle point mixed equation:

a(u, w) - b(p, w) + c(lambda, w) = f
b(q, u) + d(q, p)                = g
c(gamma, u)                      = 0

In the last equation is the transmission condition, forcing the flux u to be single-valued on mesh facets. I realized that in the HybridizationPC, we scale the transmission equation by -1, which is fine because this doesn't change the problem. This actually equivalent to what Cockburn et al. has done when analyzing the condensed system. IOW, this was done to ensure we actually get a positive definite operator. To test this, I rewrote the problem in UFL as:

a(u, w) - b(p, w) + c(lambda, w) = f
b(q, u) + d(q, p)                = g
-c(gamma, u)                     = 0

Again, mathematically equivalent, but now SCPC produces an SPD operator where everything works in serial and parallel. I have added a test in https://github.com/firedrakeproject/firedrake/commit/7d2019b3bcbf959c5fb00cfaf7dce4ea9158baf1. I also tested this in @eikehmueller and Jack's HDG code using SCPC + GTMGPC. It's working now. So I believe this has now been resolved.

firedrakeproject / firedrake

Parallel issues using a custom multigrid method with static condensation and hybridization #1492