Closed thomasgibson closed 5 years ago
To demonstrate the problem, here is the KSP monitor output for the second example (Mixed Poisson using SCPC with GTMG) in serial:
Residual norms for firedrake_0_condensed_field_ solve.
0 KSP preconditioned resid norm 2.608527993563e-01 true resid norm 3.988191692570e-02 ||r(i)||/||b|| 1.000000000000e+00
1 KSP preconditioned resid norm 3.830328509318e-02 true resid norm 3.816913134420e-02 ||r(i)||/||b|| 9.570535793282e-01
2 KSP preconditioned resid norm 4.343369606752e-04 true resid norm 3.467268641914e-04 ||r(i)||/||b|| 8.693836478257e-03
3 KSP preconditioned resid norm 1.426126860648e-05 true resid norm 5.421700616717e-06 ||r(i)||/||b|| 1.359438320585e-04
4 KSP preconditioned resid norm 4.066643734775e-07 true resid norm 2.520100603594e-07 ||r(i)||/||b|| 6.318905403392e-06
5 KSP preconditioned resid norm 1.539995204885e-08 true resid norm 5.022509180942e-09 ||r(i)||/||b|| 1.259344978402e-07
6 KSP preconditioned resid norm 1.475626028393e-10 true resid norm 8.241467982731e-11 ||r(i)||/||b|| 2.066467366172e-09
Now here is the same example using just 4 mpi processes:
Residual norms for firedrake_0_condensed_field_ solve.
0 KSP preconditioned resid norm 2.550759937340e-01 true resid norm 3.988191692570e-02 ||r(i)||/||b|| 1.000000000000e+00
1 KSP preconditioned resid norm 1.053102815308e-01 true resid norm 9.579242651267e-02 ||r(i)||/||b|| 2.401901259940e+00
I am not sure how to reveal the cause of this.
Possible cause, the restriction/prolongation are not correct in parallel (not forcing halo exchanges perhaps?)
Notice how the initial preconditioned residual is already wrong.
Don't know if this reveals anything, but I just tried the following options for the condensed KSP:
'condensed_field': {'ksp_type': 'fcg',
'mat_type': 'matfree',
'ksp_rtol': 1e-8,
'ksp_monitor_true_residual': None,
'pc_type': 'python',
'pc_python_type': 'firedrake.GTMGPC',
'gt': {'mg_levels': {'ksp_type': 'gmres',
'pc_type': 'bjacobi',
'sub_pc_type': 'ilu',
'ksp_max_it': 5},
'mg_coarse': {'ksp_type': 'preonly',
'pc_type': 'lu',
'pc_factor_mat_solver_type': 'mumps'}}}}
And this works in serial and in parallel. But maybe this is masking the problem since I'm using GMRES to clean up the mess when restricting from Trace to P1?
Possible cause, the restriction/prolongation are not correct in parallel (not forcing halo exchanges perhaps?)
Is there a way that I can check/fix this? I'm using the interpolation matrix provided by the Interpolator
object (this is using the stuff in #1453).
To check the interpolation, what are the invariants you expect? Check those.
One question, you're using unscaled richardson as a smoother, is that safe? Are you getting lucky in serial.
Okay, thanks!
One question, you're using unscaled richardson as a smoother, is that safe? Are you getting lucky in serial.
Probably, now that I think about it. Though I don't know of an intuitive way to compute the scaling factor without just heuristically determining it.
Try running a parallel version of some of the interpolator tests.
From: Thomas H. Gibson notifications@github.com Sent: 29 August 2019 14:17:42 To: firedrakeproject/firedrake firedrake@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [firedrakeproject/firedrake] Parallel issues using a custom multigrid method with static condensation and hybridization (#1492)
Don't know if this reveals anything, but I just tried the following options for the condensed KSP:
'condensed_field': {'ksp_type': 'fcg',
'mat_type': 'matfree',
'ksp_rtol': 1e-8,
'ksp_monitor_true_residual': None,
'pc_type': 'python',
'pc_python_type': 'firedrake.GTMGPC',
'gt': {'mg_levels': {'ksp_type': 'gmres',
'pc_type': 'bjacobi',
'sub_pc_type': 'ilu',
'ksp_max_it': 5},
'mg_coarse': {'ksp_type': 'preonly',
'pc_type': 'lu',
'pc_factor_mat_solver_type': 'mumps'}}}}
And this works in serial and in parallel. But maybe this is masking the problem since I'm using GMRES to clean up the mess when restricting from Trace to P1?
Possible cause, the restriction/prolongation are not correct in parallel (not forcing halo exchanges perhaps?)
Is there a way that I can check/fix this? I'm using the interpolation matrix provided by the Interpolator object (this is using the stuff in #1453https://github.com/firedrakeproject/firedrake/pull/1453).
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/firedrakeproject/firedrake/issues/1492?email_source=notifications&email_token=ABOSV4VPOBTUJCYYAZSGLALQG7D7NA5CNFSM4IRLXNZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5OOCZQ#issuecomment-526180710, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABOSV4XAINKIKB2GTOJFYO3QG7D7NANCNFSM4IRLXNZA.
I ran the tests using the new Interpolate functionality and they all pass in parallel. But I want reiterate the strange thing I'm observing.
firedrake.HybridizationPC
+ firedrake.GTMGPC
works in serial and parallel (see first example above). But when I manually write the hybridizable problem and use firedrake.SCPC
with firedrake.GTMGPC
, it has problems in parallel. SCPC has been tested in serial and parallel and has been functioning just fine for a long while. But something with the composition of the two python PCs is messing things up.
So question 1 is: is the solution to the manual hybridizable problem equivalent in practice to the solution from the hybridizable mixed problem?
Then, is the system HybridizationPC builds equal to that system?
Is the same set of options and same appctx reaching the solver for that system as your manu-hybridizable system?
cheers
--cjc
From: Thomas H. Gibson notifications@github.com Sent: 29 August 2019 16:14:54 To: firedrakeproject/firedrake firedrake@noreply.github.com Cc: Cotter, Colin J colin.cotter@imperial.ac.uk; Comment comment@noreply.github.com Subject: Re: [firedrakeproject/firedrake] Parallel issues using a custom multigrid method with static condensation and hybridization (#1492)
I ran the tests using the new Interpolate functionality and they all pass in parallel. But I want reiterate the strange thing I'm observing.
firedrake.HybridizationPC + firedrake.GTMGPC works in serial and parallel (see first example above). But when I manually write the hybridizable problem and use firedrake.SCPC with firedrake.GTMGPC, it has problems in parallel. SCPC has been tested in serial and parallel and has been functioning just fine for a long while. But something with the composition of the two python PCs is messing things up.
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/firedrakeproject/firedrake/issues/1492?email_source=notifications&email_token=ABOSV4UM23SFVHAXUVQDEUDQG7RW5A5CNFSM4IRLXNZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5O2TJQ#issuecomment-526231974, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABOSV4W7R5DM5I7FKVFLBULQG7RW5ANCNFSM4IRLXNZA.
So question 1 is: is the solution to the manual hybridizable problem equivalent in practice to the solution from the hybridizable mixed problem?
Yes, they are mechanically doing the same thing.
Then, is the system HybridizationPC builds equal to that system?
Mathematically, yes. Though SCPC does a complete factorization. Some of the blocks in the hybridizable mixed method are always 0, but SCPC isn't implemented to assume that. It also doesn't assume any off diagonal block matrices are transposes of each other. IOW: SCPC always builds a condensed operator of the form: S = D - C * A.inv * B
. In the hybridizable mixed method, D = 0
, and C = B.T
, so HybridizationPC
performs those simplifications when it builds the reduced system.
Is the same set of options and same appctx reaching the solver for that system as your manu-hybridizable system?
Yes it does, because GTMGPC
is designed to raise exceptions when it doesn't get the necessary information from the appctx. I also just checked that it does.
Here is a question for @wence-: In the implementation of GTMGPC, I do the following:
pcmg = PETSc.PC().create(comm=pc.comm)
pcmg.incrementTabLevel(1, parent=pc)
pcmg.setType(pc.Type.MG)
pcmg.setOptionsPrefix(options_prefix)
pcmg.setMGLevels(2)
pcmg.setMGCycleType(pc.MGCycleType.V)
pcmg.setMGInterpolation(1, interp_petscmat)
where interp_petscmat
is the interpolation matrix obtained via:
fine_space = context.a.arguments()[0].function_space()
interpolator = Interpolator(TestFunction(coarse_space), fine_space)
interpolation_matrix = interpolator.callable()
interpolation_matrix._force_evaluation()
interp_petscmat = interpolation_matrix.handle
fine_space
is the trace space, coarse_space
is the P1 space.
Is PETSc's PCMG able to get the appropriate restriction/prolongation operators from this? Or should I also be providing a restriction matrix?
What are the blocks that are inverted in the Block-Jacobi smoother, if we use 'bjacobi'
together with firedrake.GTMGPC
? I always thought that a block consists of the dofs in one facet, but maybe this is not true?
The reason I think this is because if I set 'mg_levels':{''pc_type': 'bjacobi','sub_pc_type':'lu',...}
, then the solver in the MWE above converges to machine precision in one iteration on one processor, but requires several iterations (and still converges) on two processors.
So maybe if we use ILU instead (as in the original code above), it will do an ILU for the full system on each of the processors, resulting in a non-SPD preconditioner, which crashes with CG (but works actually ok-ish with GMRES, as I checked).
firedrake.HybridizationPC
might use the correct blocking, though.
I think the blocks in 'bjacobi'
are just of size $1/P$ where $P$ is the number of processes being run on (unless otherwise specified).
https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCBJACOBI.html
That's right.
From: JDBetteridge notifications@github.com Sent: 29 August 2019 17:41:10 To: firedrakeproject/firedrake firedrake@noreply.github.com Cc: Cotter, Colin J colin.cotter@imperial.ac.uk; Comment comment@noreply.github.com Subject: Re: [firedrakeproject/firedrake] Parallel issues using a custom multigrid method with static condensation and hybridization (#1492)
I think the blocks in 'bjacobi' are just of size $1/P$ where $P$ is the number of processes being run on (unless otherwise specified).
https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCBJACOBI.html
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/firedrakeproject/firedrake/issues/1492?email_source=notifications&email_token=ABOSV4VMCLU7DHW4CCI6STDQG732NA5CNFSM4IRLXNZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5PDGKY#issuecomment-526267179, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABOSV4XSGNWH332S5MXGEFDQG732NANCNFSM4IRLXNZA.
If I use
'mg_levels': {'ksp_type': 'richardson',
'pc_type': 'jacobi',
'ksp_convergence_test': 'skip',
'ksp_max_it': 2}
The convergence rates between the sequential and parallel runs are absolutely identical (although both don't converge). This agrees with what I expect, since Jacobi with a fixed number gives the same results, no matter how many processors I use. But with what Colin and Jack say above, I'm not very surprised that the results with Block-Jacobi depend on the number of processors, and it only works sequentially.
Still, the question is why the behaviour is different with firedrake.HybridizationPC
, where it works both in parallel and sequentially. I'll look at that next.
Using the Jacobi smoother as above with firedrake.HybridizationPC
, both sequential and parallel code converge, with very similar (although not exactly identical) convergence rates.
None of this really helps to shed light on the problem, though...
OK. So I ran an experiment and found the source of the problem. Basically I set all solvers to be preonly
. I took a single application of the multigrid method (using LU on the P1 problem) and checked the output of the scalar field (pre-smooth, coarse direct solve, post-smooth). I noticed that the output for SCPC is flipped in sign (HybridizationPC
was fine). The infamous sign error.
Now, consider the hybridizable formulation of a saddle point mixed equation:
a(u, w) - b(p, w) + c(lambda, w) = f
b(q, u) + d(q, p) = g
c(gamma, u) = 0
In the last equation is the transmission condition, forcing the flux u
to be single-valued on mesh facets. I realized that in the HybridizationPC
, we scale the transmission equation by -1, which is fine because this doesn't change the problem. This actually equivalent to what Cockburn et al. has done when analyzing the condensed system. IOW, this was done to ensure we actually get a positive definite operator. To test this, I rewrote the problem in UFL as:
a(u, w) - b(p, w) + c(lambda, w) = f
b(q, u) + d(q, p) = g
-c(gamma, u) = 0
Again, mathematically equivalent, but now SCPC produces an SPD operator where everything works in serial and parallel. I have added a test in https://github.com/firedrakeproject/firedrake/commit/7d2019b3bcbf959c5fb00cfaf7dce4ea9158baf1. I also tested this in @eikehmueller and Jack's HDG code using SCPC
+ GTMGPC
. It's working now. So I believe this has now been resolved.
Dear all,
I am truly mystified by this apparent bug I am observing using the TSFC branch
interpolation-operator
and the Firedrake branchgtmg-rebased
. I will do my best to summarize the problem here.I have been working with @eikehmueller and Jack Betteridge on a custom multigrid method using the procedure outlined by Gopolokrishnan and Tan (https://onlinelibrary.wiley.com/doi/abs/10.1002/nla.636) and Cockburn et al (http://web.pdx.edu/~gjay/pub/mgHDG.pdf).
The TLDR version: For hybridizable systems, you can condense problems to a reduced system for the unknowns on a trace space (functions living only on cell facets). This reduced system is spectrally equivalent to an operator appearing in an H1 discretization of a primal problem (Poisson operator for mixed Poisson, SPD Helmholtz for mixed Helmholtz, etc.)
The multigrid algorithm
firedrake.HybridizationPC
orfiredrake.SCPC
, which uses Slate), we "restrict" the trace system to a P1 discretization of the primal operator.Implementation I wrote a python preconditioner for this multigrid method on the branch
gtmg-rebased
, calledfiredrake.GTMGPC
. The intention is that this can be used via the standard way of composing solver options.GTMGPC
creates aPCMG
with 2 levels. The "fine problem" is just the trace system and the "coarse problem" is the P1 discretization (provided through the application context). The meshes are the same for the fine and coarse problems; this makes the algorithm a non-nested function space method. You can see the details infiredrake/preconditioners/gtmg.py
.Here is a functional mixed Poisson example using this new PC:
This composes
firedrake.HybridizationPC
withfiredrake.GTMGPC
and works great in serial and parallel. But there is an issue.The parallel problem
The following is another mixed Poisson example. This time, however, rather than using Firedrake's automagical hybridization PC, I manually write out the hybridizable system and opt to use
firedrake.SCPC
:This problem works in serial but does not appear to converge in parallel (based on the output of
ksp_monitor_true_residual
. I'm just solving the P1 problem directly.Here are some observations I made with local debugging:
firedrake.HybridizationPC
+firedrake.GTMGPC
works in serial and parallel.firedrake.SCPC
without usingfiredrake.GTMGPC
(say just direct AMG on the trace problem or LU) works in serial and parallel (not new).firedrake.SCPC
+firedrake.GTMGPC
appears to work in serial by doesn't converge in parallel.I have tried checking how communicators are passed, checking the restrict/prolongation operators, making sure it's actually composing via ksp view. I am really lost here and I'm wondering if anyone else can reproduce this and perhaps knows where the issue might be. If anyone has any input, I would greatly appreciate it.