Open termi-official opened 2 years ago
There is a tight coupling (in a software engineering sense) between the caches of the nonlinear solver and some of the time stepper.
Yeah, this is necessary for the quasi-Newton method Jacobian reuse strategies, which is why the nonlinear solvers have to be specialized to the ODE solvers instead of using NonlinearSolve.jl. It would be interesting to have a version of a nlsolver which just takes a NonlinearSolve.jl algorithm though, just for research and trying things. That would instantly give globalization strategies (via NLsolve.jl) and so it would be a test of how useful they would be to add.
An alternative approach to efficient Jacobian reuse, which still requires the solution of equation systems, can be a quasi-Newton strategy. This method class utilizes (updated) approximations of the Jacobian in each iterative solve (see e.g. [4]) - however it is often observed that the quadratic convergence of the classical Newton-Raphon is lost. Nonetheless these methods can be highly efficient as e.g. studied in [5].
One thing to note about these quasi-Newton methods, Broyden and such, is that they lack the stability of methods which have the linear solve, so they generally aren't applicable to stiff systems which is where most of the machinery is geared. That said, there are implicit non-stiff methods, such as the symplectic FIRKs, for which this would be useful.
So-called nonlinear preconditioners take the idea from linear preconditioners and apply it to nonlinear solvers - especially Newton-type methods. Instead of solving F(u) = 0 a function G \approx F^-1 such that G(F(u)) = 0 is used, which has the same roots and is simpler to solve. Alternatively it is possible to solve F(H(\tilde{u})) = 0 where `u = H(\tilde{u}) such that the nonlinear problem is simpler to solve.
Indeed this could be useful to add to NLNewton. That said, I'm not sure of a use case where a user knows a nonlinear preconditioner where a linear preconditioner wouldn't do as much of the work, but it would be interesting to see a reference on that.
The nonlinear solver framework is scattered across the ecosystem and the documentation is incomplete (although the latter is more of a minor problem, since I can throw more time on this).
It should all just be in OrdinaryDiffEq.jl. There is a little bit of duplicate functionality in SciMLBase, with a little bit that isn't duplicate:
https://github.com/SciML/DiffEqBase.jl/tree/master/src/nlsolve
The reason for that was for the nlsolvers of StochasticDiffEq. However, nowadays StochasticDiffEq.jl just depends on OrdinaryDiffEq.jl and uses its full nonlinear solver machinery. Since really only the *DiffEq.jl libraries can use these nonlinear solvers (not like they will be hacked into LSODA.jl for example), they might as well follow this structure and completely move to OrdinaryDiffEq.jl. If anyone has a second of free time that would be the best starting point here. It should be a rather simple and non-breaking change to do it.
I am happy to discuss possible designs of the nonlinear interface and invest some time here. However, I am lacking experience with nonlinear iterative solvers and some details behind ODE/DAE/... solvers, which are incorporated in the ecosystem to tackle this by-myself.
Pulling @devmotion and @YingboMa in, along with @kanav99 who worked on this some. Always good to train a new contributor. I think normally I like to understand it from the lens of TRBDF2. Maybe it's worth adding a description of this somewhere in http://devdocs.sciml.ai/latest/
Thanks for the valuable comments Chris! I have some follow up questions.
There is a tight coupling (in a software engineering sense) between the caches of the nonlinear solver and some of the time stepper.
Yeah, this is necessary for the quasi-Newton method Jacobian reuse strategies, which is why the nonlinear solvers have to be specialized to the ODE solvers instead of using NonlinearSolve.jl. [...]
Correct me if I miss something more fundamental, but is there a reason (other than the lack of man-power) why we could not wrap reuse-strategies and specific transformations (e.g. W-transformations) into hooks or other abstraction mechanisms?
One thing to note about these quasi-Newton methods, Broyden and such, is that they lack the stability of methods which have the linear solve, so they generally aren't applicable to stiff systems which is where most of the machinery is geared.
I was not aware of the the instability, but it makes at least some sense for me (I assume the secant condition is the source of the instability). I found you answer on stackexchange) on this, but I am not sure what the exact cause is. Do you by any chance have some literature on this at hand?
That said, I'm not sure of a use case where a user knows a nonlinear preconditioner where a linear preconditioner wouldn't do as much of the work, but it would be interesting to see a reference on that.
For example, you can interpret nonlinear FETI-DP and nonlinear BDDC as nonlinear right-preconditioned Newton strategies. I am leaving [1] for a reference where the classical Newton-Krylow (BoomerAMG+GMRES) setting is "outperformed". I am aware that this is rather a specific example and not even in a time-dependent setting, but at least some example.
[1] Klawonn, Axel, et al. "Nonlinear FETI-DP and BDDC methods: a unified framework and parallel results." SIAM Journal on Scientific Computing 39.6 (2017): C417-C451. https://numerik.uni-koeln.de/sites/numerik/research/nonl_framework.pdf
Correct me if I miss something more fundamental, but is there a reason (other than the lack of man-power) why we could not wrap reuse-strategies and specific transformations (e.g. W-transformations) into hooks or other abstraction mechanisms?
We could, it's just really difficult because the reuse strategies are very deep in there, using things like Newton convergence rate estimates from within the algorithm. So it's hard to have a really clean interface on that.
I was not aware of the the instability, but it makes at least some sense for me (I assume the secant condition is the source of the instability). I found you answer on stackexchange) on this, but I am not sure what the exact cause is. Do you by any chance have some literature on this at hand?
I'm not sure of the best literature on this. It's easy to show between Newton and fixed point iteration, the other ones I know by word of mouth and empirically. You can probably prove something about how it's related to the condition number of the Jacobian because the Broyden methods are essentially explicit ODE time steppers falling towards a minima of sorts.
For example, you can interpret nonlinear FETI-DP and nonlinear BDDC as nonlinear right-preconditioned Newton strategies. I am leaving [1] for a reference where the classical Newton-Krylow (BoomerAMG+GMRES) setting is "outperformed". I am aware that this is rather a specific example and not even in a time-dependent setting, but at least some example.
That's a nice example. Yeah, this would all be a good GSoC project.
Hey @termi-official and @ChrisRackauckas has there been any progress or further discussion on this?
I could not really find time to circle back to this one, sorry for this, but it is still on my list. As far as I can judge from peeking the internals from time to time the nonlinear interfaces has been quite nicely refactored since I have opened the issue! I tried to follow Chris advice a few weeks ago and bridge NonlinearSolve.jl, but I got stuck and ran out of the time which I could allocate. I also won't find time to give this one another try before August.
One reason to break this out is that other features like multiple shooting might also want to introspect the convergence rate of a solver.
Can you elaborate further @oscardssmith ? I am not familiar with this class of methods.
The idea is that if you are trying to find roots without a good guess multiple shooting has you initialize a root-finder from a lot of different points and to increase your chance of having one of them near a root. To make this more efficient, you likely want to stop root-finding on the points that fail to converge relatively quickly.
That's not multiple shooting. That's a multistart method.
We should pick back up this idea though of committing to the NonlinearSolve.jl interface. That interface now has proper init
/solve
/solve!
choices, uses LinearSolve and has integration with SparseDiffTools.jl, etc. It has everything the ODE solver nonlinear solvers have and more, other than the specialized reuse techniques. However, with the current interface, the reuse techniques can be done. Effectively it would just need:
cache = init(nonlinearprob, alg)
to construct the nonlinear solverresample_jacobian(cache, u0)
to force when to update itThen everything you're talking about in the OP would just need to go into NonlinearSolve.jl. That would be a much cleaner design and allow the same tooling to be reused in all sorts of other solvers. We're now close enough that we should really make the push to get it done sometime in the next year.
I agree on the part that this issue is also related to the NonlinearSolve.jl interface and I love the progress on its design. However, I have trouble seeing how to preserve specific structural information from the time-dependent/semidiscrete problem into the nonlinear solver, such that everything can go into NonlinearSolve.jl. Let me elaborate on an example from electrophysiology, i.e. a monodomain problem, where I currently see some specific open issues related to OrdinaryDiffEq.jl (in addition to NonlinearSolve.jl, and maybe LinearSolve.jl). Monodomain problems can be written down as the following system:
$$ \begin{aligned} \partial_t {u} &= \nabla \cdot D \nabla u + f({u},\boldsymbol{v}, t) \ \partial_t \boldsymbol{v} &=\boldsymbol{g}(u,\boldsymbol{v}) , \end{aligned} $$
where we want to solve for $u$ and $v$ with a Newton-type method. The extra structure here is that $\boldsymbol{g}$ is only "locally decoupled". Using a finite element discretization in space, we end up with a problem like
$$ \begin{aligned} M d_t \hat{u} &= K \hat{u} + f(\hat{u},\hat{v}, t) \ d_t \hat{v} &=\boldsymbol{g}(\hat{u},\hat{\boldsymbol{v}}) , \end{aligned} $$
assuming $\boldsymbol{v}$ is scalar and the finite element discretization has been done in 1D with 2 linear elements and 2 quadrature points per element, then we end up with 4 dofs for $\hat{v}$ and 3 dofs for $\hat{u}$:
o-x---x-o-x---x-o
where ,,o´´ indicates the dofs $\hat{u}_1, \hat{u}_2, \hat{u}_3$ and ,,x´´ indicates the dofs $\hat{v}_1, \hat{v}_2, \hat{v}_3, \hat{v}_4$.
Next, we can, for example, simply discretize in time with Backward Euler and solve the nonlinear problems with the standard Newton-Raphson. However, for this larger class of problems we know that this solve is not efficient, especially if we go to big 3D problems with nonlinear ansatz spaces.
A more efficient solve here could for example use the Schur complement of the full jacobian. We can already see the 2x2 block structure in the sparsity pattern:
$$ J_{u,v}(\hat{u},\hat{v}) := \begin{bmatrix} K + \partial_u f & \partial_v f \ \partial_u g & \partial_v g \ \end{bmatrix} $$
where $\partial_v g$ is per construction a (block-)diagonal matrix. Now a first logical step to optimize the solve is to use the schur completment in the inner linear solve. I think this can be done already, but I have trouble here on how to encode it in types (via LinearSolve.jl).
Now, for this specific class of problems it has been shown that we still can do better. Instead of using the Schur complement it is possible to cleverly use the implicit function theorem to first solve for $\hat{v}$ with a fixed guess for $\hat{u}$. With this we can now fix $\hat{v}$ and first solve a helper system $C = - [\partial_v g]^{-1} \partial_u g$ (which is reasonably because $\partial_v g$ is block-diagonal). The remainder works as for the Schur complement to get the correct jacobian for quadratic convergence for the nonlinear solve.
With this method I am not even sure where to start, because on the one hand the extra structural information has to be passed down to the nonlinear solver and then we have the local/nested nonlinear solver. Higher order time discretizations follow a similar strategy. I can provide papers on this if wanted. I am also willing to implement prototypes, but I am really know stuck where to start and which kinds of designs are even acceptable for the SciML ecosystem. I mean the obvious one is "just define another problem" like LocalGlobalODEProblem/LocalGlobalDAEProblem/...
, dispatch on suitable existing solvers like the SDIRK family and a specific nonlinear solve for this problem like MultilevelNewtonSolver
, but I am not confident yet that this is really the way to go.
cc @frankschae from the slack discussion
What I meant is that using the NonlinearSolve.jl interface would make it have a very well-defined and documented way to be easily extended. We just need to get it the right hooks to fully replace the internal nonlinear solve, and then we can just use that. Once we are there, all of the things you want to do would just be done via adding new (maybe specialized) solvers to NonlinearProblem.
Using information about the Schur complement Multilevel-Newton / Nested Solvers
These are not uncommon things, and indeed adding special solvers to NonlinearSolve.jl for these cases should be on the docket. It would be good to open an issue on the repo that describes these solvers in detail along with references. I could see these as good GSoC projects.
What I meant is that using the NonlinearSolve.jl interface would make it have a very well-defined and documented way to be easily extended. We just need to get it the right hooks to fully replace the internal nonlinear solve, and then we can just use that. Once we are there, all of the things you want to do would just be done via adding new (maybe specialized) solvers to NonlinearProblem.
This is exactly my concern. Let us take the problem from the most recent my most recent answer. This would be an ODEProblem. I am still struggling to see how to pass down specific structural problem information down to the inner nonlinear solver. E.g. if we want to use a a (non-)linear solver where the Schur complement is directly used, then, to my understanding, the information about the block structure is already lost when we arrive at the solve stage. Should this information then implicitly be encoded in jac_prototype
for ODEFunction
or what is the idea here?
Using information about the Schur complement Multilevel-Newton / Nested Solvers
These are not uncommon things, and indeed adding special solvers to NonlinearSolve.jl for these cases should be on the docket. It would be good to open an issue on the repo that describes these solvers in detail along with references.
Will grab this once I got an idea what changes need to go where.
I could see these as good GSoC projects.
Sure. We can go looking for students once the baseline is done and I gathered some experience with the SciML internals. I have already started but I need more time to explore the codebase and its design. From my side I would like to polish our Navier Stokes solver (https://ferrite-fem.github.io/Ferrite.jl/dev/tutorials/ns_vs_diffeq/) as a first study and add another example with Newmark (i.e. resolve https://github.com/SciML/OrdinaryDiffEq.jl/issues/411) to Ferrite. Once I have done this we can take a look at the nonlinear preconditioning and/or multilevel newton for the next GSoC round. Does that sound reasonable?
Some nonlinear problems can be very efficiently solved with non-standard methods, possibly abusing the specific structure of the problem. To allow users to efficiently utilize such methods in tandem with existing time solvers it might be helpful to expand the nonlinear solver interface (in the long term). A nice first approach to a nonlinear solver interface is given in [6]. Let me give 4 examples for possible applications of such an interface to boost the time step performance in specific applications to study possible requirements on such an interface (in addition to the existing ones, e.g. W-transformation, Jacobian reuse, ...). This list is not ment to be exhaustive, as I do not have a full overview on the state of the art on nonlinear solvers:
Multilevel-Newton
Nonlinear problems arising from space-time discretizations of PDEs, PDAEs and similar, with solution variables that are not directly dependent on spatial derivatives can be efficiently solved with a multilevel strategy. These solution variables are often referred to as internal variables, occurring regularly in electronics and mechanics. Now, denoting the internal variables with
q
and the remaining variables withu
, we can write the problem in mass matrix form aswhere
'
denotes the time derivative andM
is a positive semi-definite matrix andN
is a positive semi-definite diagonal matrix. Now we can algorithmically decouple the solve for s and u by an alternating Newton scheme, where we solve fors
on the "local" level andu
on the "global" level. This way we obtain a large number of small local problems and one large problem on global level, however if dim(s) >> dim(u) we can greatly improve the efficiency of the scheme. A detailed derivation with FEM in conjunction with Runge-Kutta schemes is for example in [1].Quasi-newton methods
An alternative approach to efficient Jacobian reuse, which still requires the solution of equation systems, can be a quasi-Newton strategy. This method class utilizes (updated) approximations of the Jacobian in each iterative solve (see e.g. [4]) - however it is often observed that the quadratic convergence of the classical Newton-Raphon is lost. Nonetheless these methods can be highly efficient as e.g. studied in [5].
Globalization strategies
Newton-Raphon does not perform well if no good initial guess for the nonlinear solve is available. In these scenarios so-called globalization strategies can help to some extend, usually by limiting the length of the tangent appropriately. Some overview is given in the methods section of [2] and the references therein, as well as the Eisenstat-Walker paper [3].
Nonlinear preconditioning
So-called nonlinear preconditioners take the idea from linear preconditioners and apply it to nonlinear solvers - especially Newton-type methods. Instead of solving
F(u) = 0
a functionG \approx F^-1
such thatG(F(u)) = 0
is used, which has the same roots and is simpler to solve. Alternatively it is possible to solveF(H(\tilde{u})) = 0
where `u = H(\tilde{u}) such that the nonlinear problem is simpler to solve.References
[1] Hartmann, Stefan. "A remark on the application of the Newton-Raphson method in non-linear finite element analysis." Computational Mechanics 36.2 (2005): 100-116. [2] Pawlowski, Roger P., et al. "Globalization techniques for Newton–Krylov methods and applications to the fully coupled solution of the Navier–Stokes equations." SIAM review 48.4 (2006): 700-721. [3] Eisenstat, Stanley C., and Homer F. Walker. "Globally convergent inexact Newton methods." SIAM Journal on Optimization 4.2 (1994): 393-422. [4] Matthies, Hermann, and Gilbert Strang. "The solution of nonlinear finite element equations." International journal for numerical methods in engineering 14.11 (1979): 1613-1626. [5] Liu, Tiantian, Sofien Bouaziz, and Ladislav Kavan. "Quasi-newton methods for real-time simulation of hyperelastic materials." Acm Transactions on Graphics (TOG) 36.3 (2017): 1-16. [6] Brune, Peter R., et al. "Composing scalable nonlinear algebraic solvers." SIAM Review 57.4 (2015): 535-565.
Problems
When trying to implement a Multilevel-Newton (with Ferrite.jl) I encountered two major issues.
I am happy to discuss possible designs of the nonlinear interface and invest some time here. However, I am lacking experience with nonlinear iterative solvers and some details behind ODE/DAE/... solvers, which are incorporated in the ecosystem to tackle this by-myself.