Closed jhale closed 1 week ago
Failing line search in parallel run https://petsc.org/release/manualpages/SNES/SNES_DIVERGED_LINE_SEARCH
This works but seems unsatisfactory. A point from Jonas:
"...the pointer to the PETSc solution vector is linked directly to the pointer of the solution vector of Dolfinx, while in fact both instead should be independent, i.e. the vector at which Dolfinx assembles the forms and the solution vector PETSc works must not be the same to allow for these temporary solution vectors in PETSc."
Looking at this again, I think a simpler solution would be to pass a new independent vector to a call to snes.solve().
Could we move this implementation into the dolfinx namespace?
It is rather hard for users to see how to use PETSC snes as they have to browse the tests.
To elaborate a bit more on the PETSc side of things to help the discussion:
In my understanding, the SNES solver object usually only works with a solution vector $X$ at which it needs to evaluate the function $F(X)$ as well as the Jacobian $J(X)$.
To me it seems that with the previously proposed SNESProblem class setup, the PETSc vector $X$ is somehow directly linked to self.u
in the dolfinx namespace such that we can assemble $a(X)$ and $L(X)$.
This is not an issue unless we want to properly use the SNESLinesearch routines: They work with another temporary vector $W$ at which the line search routines need to evaluate $F(W)$, while they explicitly rely on $X$ as an unchanged reference point to (iteratively) test different step sizes $\lambda_i$, i.e.
$$W{i+1} = X + \lambda{i+1} \Delta X$$
See for instance here.
However, once the line search routines call the SNESProblem.F
method passing in $W$, since dolfinx wants to assemble $L(X)$, we need to explicitly overwrite $X \leftarrow W$, which then causes $X$ to change in the PETSc namespace, meaning that the line searches can't rely on a fixed reference point to test different step lengths $\lambda$.
This means that the line searches (for iteration $i>1$) get
$$W_{i+1} = Wi + \lambda{i+1} \Delta X$$
instead of the expected
$$W{i+1} = X + \lambda{i+1} \Delta X$$
which causes them to fail when they try to fit an ansatz e.g. to $||F||_2$ along the step length. When one looks at the output of the line search routines, this typically means that they shrink the step size $\lambda$ further and further, until the minimum step size is reaches and then exit. As long as the line search only needs a single evaluation of $F(W)$, everything appears to work since in that case it does not matter if the reference point is not consistent due to $X\leftarrow W$.
So perhaps another idea to fix the issue would be to somehow 'unlink' u
in the dolfinx namespace from X
in the PETSc namespace?
Meaning that the solution vector at which dolfinx assembles $a$ and $L$ are independent of the vectors in the PETSc namespace.
Then the SNESProblem.F
method could assemble the forms at whatever point PETSc passes in, without us interfering with the solution vectors of PETSc.
I think this was also the problem in this post on the discourse group. There, one can also see the behavior I mentioned earlier of the line search routine shrinking the step size further and further, until the minimum is reached and then exiting.
"So perhaps another idea to fix the issue would be to somehow 'unlink' u in the dolfinx namespace from X in the PETSc namespace?"
Agreed, I think that is the best solution and I think it's more consistent with the design of DOLFINx and SNES. We should also consider adjusting our own NewtonSolver if we make the same link there.
Thanks @jonas-heinzmann for noticing this.
I'd suggest following patch. I've checked and all other uses of SNES (e.g. test_petsc_nonlinear_assembler.py
) do allocate new solution vector (unrelated to any function in forms),
diff --git a/python/test/unit/nls/test_newton.py b/python/test/unit/nls/test_newton.py
index ba56dbcdf1..8ef044f515 100644
--- a/python/test/unit/nls/test_newton.py
+++ b/python/test/unit/nls/test_newton.py
@@ -224,13 +224,16 @@ class TestNLS:
snes.getKSP().setTolerances(rtol=1.0e-9)
snes.getKSP().getPC().setType("lu")
- snes.solve(None, u.x.petsc_vec)
+ x = u.x.petsc_vec.copy()
+ x.ghostUpdate(addv=PETSc.InsertMode.INSERT, mode=PETSc.ScatterMode.FORWARD)
+
+ snes.solve(None, x)
assert snes.getConvergedReason() > 0
assert snes.getIterationNumber() < 6
# Modify boundary condition and solve again
u_bc.x.array[:] = 0.6
- snes.solve(None, u.x.petsc_vec)
+ snes.solve(None, x)
assert snes.getConvergedReason() > 0
assert snes.getIterationNumber() < 6
# print(snes.getIterationNumber())
I think this is fixed - @jonas-heinzmann can you try this out?
In short, users should (almost) never pass the same vector memory to snes.solve
as used in the specifying the residual or Jacobian form.
@jorgensd We can bring NonlinearPDE_SNESProblem
out to the user API in another PR.
I also took a look at our NewtonSolver
. In our current NonlinearPDEProblem
class we assume that x
and u
share the same memory - infact without this the solver will not converge. I would propose for consistency with SNES (that I believe has the right design here) that we remove this implicit coupling in our examples.
This seems to do the trick, thanks a lot!
Thanks to @jonas-heinzmann for clearly identifying and explaining this issue and @michalhabera for help creating this fix.
When using a linesearch routine SNES needs to calculate (multiple times) the norm of the residual evaluated at a vector
w
equal to the current solution plus some proposed scaling factor times the Newton increment. To do this SNES calls the user-defined residual evaluation routine passingw
. Our current implementation of the residual evaluation routine overwrites the residual form stateu
withw
but does not replaceu
with the current solutionx
before returning. This leads to a failure of the linesearch.This patch fixes this behaviour by preserving the internal state of the residual.