SciML / PDERoadmap

A repository for the discussion of PDE tooling for scientific machine learning (SciML) and physics-informed machine learning
https://tutorials.sciml.ai/
18 stars 6 forks source link

Iterative solvers and affine operators: why `*`? #14

Closed ChrisRackauckas closed 6 years ago

ChrisRackauckas commented 6 years ago

@jlperla asked whether using * was appropriate. I think this argument shows that it's not just appropriate, but rather helpful. Let's start with iterative solvers for linear systems. Let A be a matrix and b be a vector. We want to find x s.t. Ax=b. Direct solvers do this like A\b. But iterative solvers can do this in a fancier way. Instead, it constructs a sequence x_n -> x. The simplest is conjugate gradient ( https://en.wikipedia.org/wiki/Conjugate_gradient_method ) (which require symmetric positive definite). The most widely applicable is GMRES ( https://en.wikipedia.org/wiki/Generalized_minimal_residual_method ).

These have a few advantages over direct solvers. For one, iterative solvers can have a tolerance, so you only have to solve to tolerance. If you only care about error to 1e-6, then you can stop. Also, all of these methods only require the forward application of the operator, i.e. A*x is used but nothing else. Therefore they can be used even if the matrix A is never constructed, unlike direct methods (Gaussian elimination). But also, by using preconditioners and other tricks, these methods can be accelerated for given problem domains. So for PDEs they are generally much much much faster than direct solvers. IterativeSolvers.jl is a good Julia package for this.

This is all known. What about with affine operators? Well, the proof for how iterative solvers usually work is by defining r_n = b - A*x_n, and showing that r_n -> 0. Now take the affine operator Q = Q_L*x + Q_b. If you want to solve Q*x=b, then the residual form is r = b - Q*x = b - Q_L*x - Q_b. Given that the properties required for convergence of the iterative method holds on Q_L, you get that r -> 0, which then implies that Q_L*x -> b - Q_b, or Q*x -> b. This means that if you put an affine operator into an iterative solver then the algorithm will "just work".

IterativeSolvers.jl is defined to use the operator's A_mul_B! method, so as long as A_mul_B! (*) is defined for our affine operator, then the methods in IterativeSolvers.jl will work as linear solvers. This holds even for our lazy composed operators since only forward application is required. Thus even though BCs can make the operators affine, these methods work just fine and Julia's generic interfaces means we don't even have to implement them.

(Note: to make \ work, we'd have to define Q\b = (Q_L) \ (b-Q_b) and then Q_L has to be real matrix (dense/sparse/banded))

jlperla commented 6 years ago

That you. I think I now get why this is so powerful.

It seems especially great for solving models with a nonlinear Hamilton component (e.g the endogenous drift choice between iterations) , since we can change the degree of precision in the inner iteration if it helps convergence speed of the bigger iteration. Whether that actually works itself, I now see the power (and why ensuring pervasive laziness is so essential)

jlperla commented 6 years ago

Fernando: I assigned this back to you as we had discussed beefing up the footnote a little (though now that Chris has educated us, we may not need much). Close this issue when you are happy with what you added