Closed baagaard-usgs closed 3 years ago
@baagaard-usgs Create toy test problem to compare selfp
Schur preconditioning with custom preconditioner in v2.2.2.
@knepley I see two differences between v2 with our custom fault PC and v3 with the selfp
Schur preconditioner.
In v2 we use quadrature points at the vertices. This results in a diagonal preconditioning matrix that is the lumped version of C K^-1 C^T. We did this because I was getting funny results with conventional quadrature points. We get good results in v3 with conventional quadrature points, so I think it is related to some short cuts in the v2 fault implementation that doesn't should up with our mathematically correct implementation in v3.
Our custom fault preconditioner is C K^-1 C^T, not -1 * C K^-1 C^T (which is what selfp gives). Is this a result of the fact that in this branch for v3, we have a RHS Jacobian but not a LHS Jacobian?
Bottom line is that from my toy problem it looks like pmat_v2 = -lumped(pmat_v3)
.
@knepley Looking back at our 2013 JGR paper, I noticed that using upper
Schur factorization with the user
Schur preconditioner gives comparable scaling to our custom fault preconditioner with multiplicative field split type. This remains true for v3 (see plot below). Because the custom fault preconditioner can use preonly
with the Lagrange multiplier KSP type rather than gmres
with the Schur factorization, it has a shorter runtime even though the number of iterations grows faster with problem size.
This suggests we don't need to worry about the difference in fault basis functions between v2 and v3. The diag
Schur factorization type and additive
field split type don't result in good scaling in v2 (see our JGR paper). We should replicate the custom fault preconditioner in v3, because it is faster than using upper
Schur factorization with the user
preconditioner with only slightly worse scaling with problem size.
Lagrange multiplier KSP relative tolerance optimized to minimize runtime.
ksp_rtol = 1.0e-11
ksp_atol = 1.0e-12
ksp_gmres_restart = 20
snes_rtol = 1.0e-9
snes_atol = 1.0e-10
pc_type = fieldsplit
pc_use_amat = true
pc_fieldsplit_type = schur
pc_fieldsplit_schur_factorization_type = upper
pc_fieldsplit_schur_precondition = user
fieldsplit_displacement_pc_type = ml
fieldsplit_displacement_ksp_type = preonly
fieldsplit_lagrange_multiplier_fault_pc_type = jacobi
fieldsplit_lagrange_multiplier_fault_ksp_type = gmres
fieldsplit_lagrange_multiplier_fault_ksp_rtol = 1.0e-2
Lagrange multiplier KSP relative tolerance optimized to minimize runtime.
ksp_rtol = 1.0e-11
ksp_atol = 1.0e-12
ksp_gmres_restart = 20
snes_rtol = 1.0e-9
snes_atol = 1.0e-10
pylithapp.problem.formulation]
split_fields = True
matrix_type = aij
[pylithapp.petsc]
fs_pc_type = fieldsplit
fs_pc_use_amat = true
fs_pc_fieldsplit_type = schur
fs_pc_fieldsplit_schur_factorization_type = upper
fs_pc_fieldsplit_schur_precondition = user
fs_fieldsplit_displacement_pc_type = ml
fs_fieldsplit_displacement_ksp_type = preonly
fs_fieldsplit_lagrange_multiplier_pc_type = jacobi
fs_fieldsplit_lagrange_multiplier_ksp_type = gmres
fs_fieldsplit_lagrange_multiplier_ksp_rtol = 1.0e-4
[pylithapp.problem.formulation]
split_fields = True
matrix_type = aij
use_custom_constraint_pc = True
[pylithapp.petsc]
fs_pc_type = fieldsplit
fs_pc_use_amat = true
fs_pc_fieldsplit_type = multiplicative
fs_fieldsplit_displacement_pc_type = ml
fs_fieldsplit_displacement_ksp_type = preonly
fs_fieldsplit_lagrange_multiplier_pc_type = jacobi
fs_fieldsplit_lagrange_multiplier_ksp_type = preonly
Matt resolved this issue. The correct settings to replicate v2 with the custom fault preconditioner and multiplicative field split type are:
pc_type = fieldsplit
pc_use_amat = true
pc_fieldsplit_type = schur
pc_fieldsplit_schur_factorization_type = lower
pc_fieldsplit_schur_precondition = selfp
pc_fieldsplit_schur_scale = 1.0
fieldsplit_displacement_pc_type = ml
fieldsplit_displacement_ksp_type = preonly
fieldsplit_lagrange_multiplier_fault_ksp_type = preonly
fieldsplit_lagrange_multiplier_fault_pc_type = ml
The performance for v3 with these settings is close to v2 with the custom preconditioner and multiplicative field split type. The solve does take more iterations, which is associated with the more accurate discretization of the Lagrange multiplier (quadrature points are not at the vertices like they are in v2).
Performance test of solver settings for fault with prescribed slip
PyLith branch: baagaard/update-examples-3dsubduction
Summary: On my laptop (optimized build), v2.2.2 takes 243 sec whereas v3.x takes 4420 sec.
Performance summary for v2.2.2
Solver settings
Performance summary for v3.x (branch listed above)
Solver settings