Implicit Time Stepping - Githubissues

matthieugomez commented 6 years ago

Hi,

This project looks very ambitious, and potentially very useful for economics researchers. Thanks a lot for working on this.

I just want to share something I have learned solving HJB equation: HJB iterations are much more robust when using fully implicit/nonlinear iterations, rather than the semi-implicit/linear iterations recommanded in Achdou, Han, Lasry, Lions, Moll.

To be more precise, let us say that a HJB time step iteration takes the following form:

(V_{t+1}-V_t)/Δ = a(x) + μ(x)V_{x, t+1} + σ(x)^2/2 V_{xx, t+1}

where critically μ(x) and σ(x) depend on V_x (due to first order conditions or general equilibrium in financial models). The goal is to compute V_{t+1} given V_t. Achdou, Han, Lasry, Lions, Moll recommend to use a "semi-implicit time stepping", i.e. to compute μ(x), σ(x) using V_{x, t}, and then obtain V_{t+1} as the solution of a linear problem.

In my experience, it is much more robust to do a "fully implicit time stepping", i.e. to computeμ(x), σ(x) using V_{x, t+1} (rather than V_{x, t}), and then obtain V_{t+1} as the solution of a non linear problem. On the bad side, this makes the time stepping a bit slower (i.e. each time step requires to solve a non linear problem rather than a linear problem). On the good side, this algorithm works for any PDE I have tried (in contrast to the semi implicit time stepping above).

I have implemented this solution in the package EconPDEs, where I show that this can solve state-of-the-art financial asset pricing models. I explain the algorithm in more details here

In my experience, this fully implicit time stepping is key to solve HJBs in a robust way. It would be great if this package could support this (or have a syntax general enough to allow me to rewrite my package with the tools developed here). I don't know if this is possible, but I just wanted to write this now, as the design of this package is still a work in progress.

jlperla commented 6 years ago

@ChrisRackauckas: These comments are extremely important when thinking about how composed update_coefficients! design would work (which I think works well, given what we had discussed), but also about how the discretization of the space then hooks into the time-stepping.

Hopefully the time-stepping is relatively orthogonal to the space discretization (which we have concentrated on up to this point).

ChrisRackauckas commented 6 years ago

Yes, this comes for free with the DifferentialEquations.jl integration which has high order implicit methods for time stepping.

There are better semi-implicit methods to consider as well, such as Rosenbrock. Here's one list of implicit time stepping methods:

http://docs.juliadiffeq.org/latest/solvers/ode_solve.html#Methods-for-Stiff-Equations-1

and another good set are these:

http://docs.juliadiffeq.org/latest/solvers/ode_solve.html#Sundials.jl-1 http://docs.juliadiffeq.org/latest/solvers/ode_solve.html#ODEInterface.jl-1 http://docs.juliadiffeq.org/latest/solvers/ode_solve.html#LSODA.jl-1

Additionally, exponential integrators can work great in this case which is a GSoC project.

Setting up the operators in a way that they can be used with all time stepping methods was the first goal of the project and indeed it works via the update_coefficients! mechanism.

jlperla commented 6 years ago

@matthieugomez One of the big goals here as well is that the space discretization is lazy and matrix-free, which better enable iterative methods. Economists started with using only dense matrices, then started using sparse matrices (and I think you are the main reason for diffusing that knowledge!) and with composition of operators we may be able to avoid them altogether.

@ChrisRackauckas The thing I wasn't sure about with the current DAE solvers is how they hook into AD for the system of equation. Is it useful/possible to have auto-differentiation for those? I didn't see anything in the Sundials interface when I looked at it before.

ChrisRackauckas commented 6 years ago

Sundials does not use AD unless you define your own Jacobians. We can do that in Sundials.jl though, or outside of Sundials.jl a user can do that manually and pass it to Sundials.jl. Some parts of the composed operators will not work well with ForwardDiff because of caching vectors, but a Capstan-based approach will do fine here. Anyways, this question is far off topic. The answer is that it will work in theory, in practice it's not a good way to do it anyways (since from a linear operator it's easy to define the Jacobian) and it will need the Cassette changes, but it doesn't really matter since finite differencing does well in these cases anyways. What's more important than any of that are setups for iterative solvers and sparse factorizations (KLU, SuperLUMT, etc.) which exist and are also being improved independently of the operators.

matthieugomez commented 6 years ago

I am not sure I understand everything, so I am sorry if I misunderstand you, but my point is that the HJB is not a linear operator in my example, since μ(x) depends non linearly on V_{t+1}. For now, I compute the whole Jacobian using autodiff, but I wish there was a way to only compute certain derivatives, as you discussed here.

jlperla commented 6 years ago

Yes, for those HJBE with a continuous control of drift, things are ultimately nonlinear in the discretization of space.

As far as I can tell, many of the algorithms economists have used (e.g. the algorithm in http://www.princeton.edu/~moll/HACTproject/HACT_Numerical_Appendix.pdf) solve this sort of thing by an iterative procedure where μ(x) is first "guessed", then the ODE/PDE using the linear operator (which it is, conditional on a μ(x)) and then the hamiltonian is used to update the μ(x) guess, etc. There may be a more efficient way of doing that exact iteration for a fixed point where newton iteration becomes equivalent to solving the system with newton's method, but the approach always solves linear ODE/PDEs in any case.

Now if I understand from https://github.com/matthieugomez/EconPDEs.jl/blob/master/src/details.pdf I believe you substitute in the hamiltonian part to form a completely nonlinear differential operator, and then directly solve it as a fully nonlinear system of equations? I had tried that in matlab in the fall, with more limited success, but that was matlab....

Which one is faster is an empirical question, but my concerns are usually more on robustness/convergence than speed. The big question I have on combining everything into a single nonlinear ODE/PDE comes down to two things:

Does the Barles-Souganidis framework for convergence for upwind schemes work if you do things post-substitution in a fully nonlinear operator and system of equations?
- e.g. https://www.fields.utoronto.ca/programs/scientific/09-10/finance/courses/tourin.pdf seems to tell me that you would need to do a post substitution monotonicity proof, but I could be wrong.
- I think that section 1.2 of http://www.princeton.edu/~moll/HACTproject/HACT_Numerical_Appendix.pdf gives some sense of it, where footnote 3 says it ends up similar to the iterative scheme.
The general convergence properties of the substitution approach are not obvious to me, because in reality you can only invert and substitute the FOCs from the hamiltonian under certain conditions (e.g. envelope conditions? 2nd over conditions?, etc.)
- Even if we know those conditions hold at the equilibrium solution, is it a problem for converging to the wrong equilibrium if the nonlinear solver gets pushed in directions that break those envelope conditions?
- Is it a sufficiently convex problem? Does the nonlinearity introduce numerical round-off errors, etc. that become a big problem when the accumulate over longer time-horizons. For example, with a markov-chain interpretation, if the probabilities add up to 1 - E-07 instead of 1 - E-16 for a given row, then that can accumulate with time-steps to become a read problem. @ChrisRackauckas has told me that when he did these sorts of PDE things before, applying the boundary conditions directly into the operator (as the current setup is trying to do) solves a lot of these issues. Most of that is beyond my pay-grade.

Again, very happy to discuss all of this stuff with you, and please critique away at the whole enterprise. The basic approach here was to be able to setup the operators for linear solutions being extremely careful with boundary conditions, with the hope that iterative methods using the linear setup would be written first.

matthieugomez commented 6 years ago

Yes, your description of the fully implicit (EconPDE) method vs semi implicit (HACT) is exactly right.

I fully agree with your tradeoff convergence vs speed. The fully implicit method gives more robust convergence at the cost of lower speed compared to the semi implicit method, so that is why I favor it.

About BarlesSounganadis: the monotonicity assumption is actually not always satisfied in the semi implicit schemes proposed in HACT. The semi implicit scheme is not necessarily monotonous w.r.t the already computed v_t (the control c_t depends on v_t). I think that is why I could not get this scheme to converge robustly in my experience. In contrast, the monotonicity condition is always satisfied in the fully implicit scheme due to the envelop theorem (i.e. the derivative of ct wrt v{t+1} cancel out). However, we don't have a theorem that says that the fully implicit time step can always be computed (i.e. that the non linear system can be solved), so it is not that helpful.

At the end of the day, my experience is that the fully implicit works better than the semi implicit, but I don't have a theorem to show it, so I am not sure how much weight you should place on it. My only arguments are that it works better in my experience (for the specific PDEs I have worked with), and that the method is widely accepted in fluid dynamics (under the name of Pseudo Transient Method).

In any case, I completely agree that his approach only works when you can invert the FOC, and so it may not always be the best solution.

jlperla commented 6 years ago

To translate for @ChrisRackauckas since I think he first thought you were talking about different implicit time-stepping algorithms: this is a question of eventually supporting the discretization of nonlinear differential operators on space directly. I don't know whether that would work well with the lazy Q approach for boundaries (assuming that we still only support affine boundary conditions) but it seems something that could be considered down the road for an operators library.

@matthieugomez Do you know if there is any way to write the HJBE for controlled drift as a quasi-linear differential operator which could be plugged into a DAE-style system? I think that Chris already has support for quasi-linear setups, which might help. Clearly it is not quasi-linear if you directly substitute for the hamiltonian in most cases, but if we augment the state maybe it is?

BTW: My guess is that the "throw everything nonlinear into a system and let the solver sort it out" approach becomes better and better when the julia's sparse AD is eventually implemented in Capstan.jl and when Julia is able to integrate commercial solvers. I had solved one of my papers doing the nonlinear system of equations approach using Knitro on matlab, and there was no hope of solving it with the builtin matlab solvers or IPOPT.

ChrisRackauckas commented 6 years ago

About BarlesSounganadis: the monotonicity assumption is actually not always satisfied in the semi implicit schemes proposed in HACT. The semi implicit scheme is not necessarily monotonous w.r.t the already computed v_t (the control c_t depends on v_t). I think that is why I could not get this scheme to converge robustly in my experience. In contrast, the monotonicity condition is always satisfied in the fully implicit scheme due to the envelop theorem (i.e. the derivative of ct wrt v{t+1} cancel out). However, we don't have a theorem that says that the fully implicit time step can always be computed (i.e. that the non linear system can be solved), so it is not that helpful.

What you're looking for here is the SSP (strong-stability preservation) property of a time-stepping scheme. Fully implicit is SSP infinite but is not practical for large PDEs. Implicit Euler though is SSP infinite as well so it guarantees monotonicity. You can prove that no other scheme is SSP infinite, but you can develop schemes which have SSP optimality which mitigates the CFL condition as much as possible for a given stepper family:

http://docs.juliadiffeq.org/latest/solvers/ode_solve.html#Explicit-Strong-Stability-Preserving-Runge-Kutta-Methods-for-Hyperbolic-PDEs-(Conservation-Laws)-1

http://docs.juliadiffeq.org/latest/solvers/ode_solve.html#Implicit-Strong-Stability-Preserving-Runge-Kutta-Methods-for-Hyperbolic-PDEs-(Conservation-Laws)-1

For these reasons, high order explicit SSP-optimal methods are normally what's employed on hyperbolic PDE discretizations since it can be cheaper to just use somewhat smaller steps of the explicit methods than to actually use the semi-implicit method if you can for example step with 5*CFL dt for far less cost. Whether this is the case for the PDEs from economics problems is something that would be interesting to investigate and not at all trivial or knowable in advance.

Fully implicit is impossible for large PDEs because the size of the Jacobian in the nonlinear system grows out of control. It's the best for smaller PDEs given Newton's convergence rate if you have a good initial condition, but a good guess can be hard to come by. But in general for a small enough problem it's best to make it a single nonlinear solve. The question is where the cutoff point is which is dependent on the amount of nonlinearity and cost of the calculations. Highly nonlinear biological problems hit the impractical part quite quickly so semi-linear methods are the only thing possible. Fluid dynamics PDEs are more straightforward and can be discretized into small systems via pseudospectral discretizations which makes fully implicit practical. Where econ applications fall is what I don't know and would be interesting to fully benchmark.

Is it a sufficiently convex problem? Does the nonlinearity introduce numerical round-off errors, etc. that become a big problem when the accumulate over longer time-horizons. For example, with a markov-chain interpretation, if the probabilities add up to 1 - E-07 instead of 1 - E-16 for a given row, then that can accumulate with time-steps to become a read problem. @ChrisRackauckas has told me that when he did these sorts of PDE things before, applying the boundary conditions directly into the operator (as the current setup is trying to do) solves a lot of these issues. Most of that is beyond my pay-grade.

You have to define the operators in ways so that way the summation actually ends up as zero. So for example instead of 1 -2 1, you define 1 1 and then you defined the middle part as the subtraction of the addition of the others. This makes every row of the operator a true zero. Of course for integers it doesn't matter, but when you get floats this can be slightly different in the 16th place or so which can cause instability over long times. As long as you define one point as the opposite of the others so the sum condition is exact then you fix this.

Ref: https://scicomp.stackexchange.com/questions/11249/numerical-derivative-and-finite-difference-coefficients-any-update-of-the-fornb/26735

matthieugomez commented 6 years ago

Thanks @ChrisRackauckas. That is very interesting. Thanks for the links, I will look more into them.

The implicit method works well in the economics problems I have worked because if you take a time step small enough, the solution for a small time step V_{t+1} is very close to the current solution V_t (i.e. the initial guess), and so, as you mention, Newton convergence rate kicks in. In cases where the Newton method does not converge, I decrease the time step until it does. I find that it allows to solve stuff more robustly than the linear method, but I'm open to the idea that this was due to some (subtle or not) mistakes in my linear code.

SciML / PDERoadmap

Implicit Time Stepping #22