add Richardson-extrapolation wrapper for any fixed-stepsize solver?

stevengj commented 3 years ago

I wonder if it would be useful to add a solver that wraps Richardson Extrapolation around another solver. You already have a limited form of this (Romberg integration for extrapolating Euler methods), but it seems like it would be easy to do this for any fixed-timestep method,

For example:

solve(prob, RichardsonExtrapolate(SSPRK33()), dt=1.73, rtol=1e-6)

could essentially just do:

Richardson.extrapolate(1.73, rtol=1e-6, contract=0.5) do dt
    solve(prob, SSPRK33(), dt=dt)
end

so that it would extrapolate the solution (at a set of requested time points) to dt → 0⁺ (starting at the timestep specified within the fixed-timestep solver), repeatedly halving the timestep.

Is this reasonable? Where would such a solver go?

stevengj commented 3 years ago

Here is a more concrete (but still very basic) implementation:

import Richardson
using DifferentialEquations

function solveextrap(prob, alg, args...; kws...)
    DiffEqBase.isadaptive(alg) && error("non-adaptive algorithms only")
    opt = Dict(kws)
    @show dt0 = get(opt, :dt, 0.1)
    otheropts = delete!(copy(opt), :dt)
    tstops = get(opt, :tstops, range(prob.tspan[1], stop=prob.tspan[2], step=dt0))
    val, err = Richardson.extrapolate(dt0, rtol=get(opt, :reltol, 1e-3), atol=get(opt, :abstol, 0.0), contract=0.5) do dt
        @show Int(dt0 / dt) # will always be an exact integer, a power of 2
        sol = solve(prob, alg, args...; dt=dt, otheropts...)
        sol.(tstops)
    end
    return val
end

If we apply this to the pendulum tutorial problem, we can compute the error of extrapolating a fixed-stepsize algorithm (e.g. SSPRK33) compared to the default adaptive solver with a low tolerance, evaluated at 10 points:

v0 = solve(prob, dt=0.2, reltol=1e-12, abstol=0).(1:10)
ve = solveextrap(prob, SSPRK33(), reltol=1e-6, abstol=0, dt=0.2, tstops=1:10)
@show norm(ve - v0) / norm(v0)

which outputs:

dt0 = get(opt, :dt, 0.1) = 0.2
Int(dt0 / dt) = 1
Int(dt0 / dt) = 2
Int(dt0 / dt) = 4
Int(dt0 / dt) = 8
Int(dt0 / dt) = 16
Int(dt0 / dt) = 32
Int(dt0 / dt) = 64
Int(dt0 / dt) = 128
norm(ve - v0) / norm(v0) = 3.272237573041655e-10

i.e. it reduced dt=0.2 down to 0.2/128 and got 9–10 digits. In comparison, if we run SSPRK33 directly with that stepsize we get a much worse error:

v = solve(prob, SSPRK33(), dt=0.2/128).(1:10)
@show norm(v - v0) / norm(v0)

outputs

norm(v - v0) / norm(v0) = 1.65666892736038e-7

(Curiously, the non-extrapolated version is getting a better (lower) error if I extrapolate with a larger tolerance, e.g. reltol=1e-3. I'm not quite sure what's going on, but I suspect that I'm doing something not quite right.)

ChrisRackauckas commented 3 years ago

Ahh, this is really cool! A few thoughts on this. Sorry this got long, but let me split this.

Extrapolation in ODEs: OrdinaryDiffEq.jl `perform_step`

The first thought would probably be to implement this in OrdinaryDiffEq.jl as a method RichardsonGeneral(alg) that defines a perform_step! which uses the steps of the internal algorithm. It would thus be an extension of our current extrapolation methods. Appropriate work estimates for the internal method would be all that's needed to make it into an adaptive method, and it would be quite a unique way to do it.

That might be something that would overcome what we saw for example in https://github.com/SciML/OrdinaryDiffEq.jl/pull/1200#issuecomment-653796975, where the problem with extrapolation-based methods is that, while they are technically arbitrary order methods, they are, for a given number of stages, equivalent to some higher order Runge-Kutta method. And the RK method that they are equivalent to is not very efficient (in terms of leading truncation error or number of f evaluations to obtain the order), so in the end they do not turn out to be completely competitive (and this is with exploiting the fact that extrapolation allows for a good amount of multithreading). Starting off the extrapolation from efficient higher order methods might be what's required to make the method actually competitive.

To get there, does this implementation of extrapolation expose task based parallelism, e.g. a way to do techniques like https://arxiv.org/abs/1305.6165 ? I think the key for extrapolation methods is exactly this potential, so effectively using multiple cores (or multiple local GPUs say in neural ODEs) would be something interesting.

However, given all of that though... I am not too bullish on extrapolation in ODEs because in any case you extrapolate, you could in theory derive a more efficient higher stage Runge-Kutta method, and I think developing new higher order RKs with special properties (using RootedTrees.jl and/or DiffEqDevTools.jl to automate the process), can always come up with a more efficient method. Thus ODEs might be a good starting point but not the right target in terms of the most useful outcome.

Side note on the current implementations

The current extrapolation perform_step!s are found in https://github.com/SciML/OrdinaryDiffEq.jl/blob/master/src/perform_step/extrapolation_perform_step.jl . It's a little messy because it's manually working around a lot of inefficiencies in the threading (using Threads.@threads with 1 thread is slower than without, the different extrapolation sequences have different optimal splittings for even work if you're not just letting the threads scheduler handle all of the scheduling, and this was done because the scheduler overhead is still quite high). Some macros or other ways to simplify the code is probably in order, but it's at least all spelled out and well-tested. @utkarsh530 was working on this over the summer, and with https://github.com/SciML/OrdinaryDiffEq.jl/pull/1212#issuecomment-661293424 the multithreaded implicit extrapolation methods might turn out to be publishable as fairly efficient ways to get high accuracy solutions of stiff equations in the case where the lu-factorization is too small to effectively make use of multithreading. So using this on a higher order implicit method, like on a 5th order RadauIIA method, could be an interesting follow up.

Extrapolation on more general differential equations

The other way to implement this would be to directly target the Integrator Interface (https://diffeq.sciml.ai/stable/basics/integrator/). You would init an integrator using a given method with adaptive=false and then do extrapolation by directly modifying integ.u and integ.t to push it around. This would need caution (or just fail) on methods that have state which changes the step calculation, like multistep methods, but in general this would give an implementation of extrapolation that would work on all one-step methods. This would include not just OrdinaryDiffEq.jl, but also StochasticDiffEq.jl and StochasticDelayDiffEq.jl (if the code makes sure to set the noise process time as well). There's some literature on this topic:

https://ttu-ir.tdl.org/bitstream/handle/2346/59979/31295007045023.pdf;sequence=1 https://arxiv.org/abs/1812.02225

These two are on SDEs, but another interesting use case would be continuous-time Markov chains, i.e. extrapolated tau-leaping:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4085235/

The cool thing about this implementation is that it would immediately generate some very novel algorithms, like extrapolation methods to solve stochastic delay differential equations and random ordinary differential equations, which could then be convergence tested to numerically conjecture that the algorithms work (and a proof would probably need to use a rough paths approach like is taken in https://www.springer.com/gp/book/9789811062643). Those algorithms would probably be very useful because they would be (a) potentially higher (weak) order in domains where higher order schemes are less optimized or harder to derive, and (b) if the implementation is multithreaded, it would probably be the first parallelized solver for many of these domains. If that parallel version is also some multi-GPU and differentiable implementation (i.e. by assigning a GPU per task and then managing the memory in the linear combination at the end) it would be both a very nice demonstration of composable software architectures and probably be very useful to many people (especially the tau-leaping variant).

stevengj commented 3 years ago

However, given all of that though... I am not too bullish on extrapolation in ODEs because in any case you extrapolate, you could in theory derive a more efficient higher stage Runge-Kutta method, and I think developing new higher order RKs with special properties (using RootedTrees.jl and/or DiffEqDevTools.jl to automate the process), can always come up with a more efficient method.

While I agree with this in principle, you have a bunch of fixed-timestep methods with special properties, e.g DGLDDRK84_F or SSPRK104, that don't have adaptive equivalents yet. Using Richardson extrapolation is potentially a quick way to use these in an adaptive way, even if in the long run it would be nicer to derive a more specialized higher-order adaptive scheme.

That is, how would you set the timestep for DGLDDRK84_F now? Presumably you're going to halve the timestep until the answer stops changing to your desired tolerance. If you're going to do that anyway, why not extrapolate?

ChrisRackauckas commented 3 years ago

That is, how would you set the timestep for DGLDDRK84_F now? Presumably you're going to halve the timestep until the answer stops changing to your desired tolerance. If you're going to do that anyway, why not extrapolate?

For that method I think a lot of people use the steplimiters or DiscreteCallbacks to set the dt to the SSP coefficient * CFL (since it's usually a PDE that this is used on). I think the difficulty is that, if you use extrapolation, do you keep the SSP property and what's the SSP coefficient in that case? It seems like it should definitely hold that it's still SSP, but I don't have intuition on whether the SSP coefficient shouldchange. @ranocha or David Ketcheson have probably looked into that before.

While I agree with this in principle, you have a bunch of fixed-timestep methods with special properties, e.g DGLDDRK84_F or SSPRK104, that don't have adaptive equivalents yet. Using Richardson extrapolation is potentially a quick way to use these in an adaptive way, even if in the long run it would be nicer to derive a more specialized higher-order adaptive scheme.

Yeah, and I think the interesting thing here is that it can be adaptive to reach a tolerance while still using the stability-bound step size.

ranocha commented 3 years ago

Thanks for pinging me. As far as I know, you will lose the SSP property when the extrapolation step involves negative coefficients, which is usually (even necessarily?) the case.

ranocha commented 3 years ago

That doesn't mean that applying extrapolation wouldn't be interesting, though. But you cannot increase the order while keeping the SSP property: Explicit SSP RK methods are at most fourth order accurate.

stevengj commented 3 years ago

I think the difficulty is that, if you use extrapolation, do you keep the SSP property and what's the SSP coefficient in that case?

My suggestion was to use extrapolation on the final solution, not on each individual step — that is, you would still use the SSP method with a fixed timestep dt for the whole integration time. So, the integration at any given dt is still stable.

stevengj commented 3 years ago

(But if you're using this for PDEs, you'd probably want to increase the spatial resolution at the same time you are increasing the time resolution, in which case you'd want to wrap any extrapolation around the outside of the whole thing, not just in the ODE part.)

ChrisRackauckas commented 3 years ago

I see I was thinking steps, and that has different use cases. You're thinking the whole solve, which would be used to get global error estimates and global tolerances. It would be a modern and more flexible implementation of GERK

https://linux.ime.usp.br/~tmacedo/Numerical%20Analysis/ode-global-error-p172-shampine.pdf

There are a ton of other methods for this as well. I think the Dormand-Prince method is the clearest:

https://www.sciencedirect.com/science/article/pii/0898122189901818

But then there's a whole literature:

https://www.sciencedirect.com/science/article/pii/0898122186900325 https://epubs.siam.org/doi/abs/10.1137/S1064827503420969?journalCode=sjoce3 https://www.sciencedirect.com/science/article/pii/0898122189901818 http://www.thebookshelf.auckland.ac.nz/docs/NZJMaths/nzjmaths029/nzjmaths029-02-009.pdf https://ieeexplore.ieee.org/document/8161197 https://www.semanticscholar.org/paper/On-the-Global-Error-of-Discretization-Methods-for-Niesen/f080d4c2629886a245153630dec23b7d587a70cd?p2df https://link.springer.com/article/10.1007/BF01389440 http://www.sci.utah.edu/publications/Ber1988b/Berzins_JSC1988.pdf https://link.springer.com/article/10.1023/A:1021190918665 https://link.springer.com/article/10.1007/s10915-004-4629-3 https://arxiv.org/pdf/1503.05166.pdf https://academic.oup.com/imajna/article-abstract/5/4/481/715486?redirectedFrom=fulltext http://www.ehu.eus/ccwmuura/research/gee2003.pdf

I did think about putting this into OrdinaryDiffEq.jl (https://github.com/SciML/GlobalDiffEq.jl/issues/22), but the way these methods work require doing multiple solves or solving an auxiliary process, or require entirely different RK methods, so it never seemed to fit quite well. It might be worth it to have a whole repo dedicated to global error estimations in ODEs, GlobalDiffEq.jl, and it would be a good undergrad/GSoC project since the list of papers right there is fairly straightforward to just pick up and run with.

The downside to having a separate repo there would be that we'd probably want the whole integrator setup for event handling even for these special RKs, so the right thing to do might be to implement the special RKs into OrdinaryDiffEq.jl in a way that requires ArrayPartition inputs for partitioned equations, and have GlobalDiffEq.jl be the guiding repo that adjust the user's inputs, defines the extended ODE, calls the special RK, and spits out the solution with the global error estimate. Then GlobalDiffEq.jl could have other methods, like RichardsonExtrapolate(alg) which similarly takes an ODE and spits out a solution with a global error estimate, all where abstol and reltol are interpreted as global tolerances.

I'll try to get a shell of a repo together over the weekend and we can get RichardsonExtrapolate(alg) done and documented with a whole system together to make this be a ready project.

ChrisRackauckas commented 3 years ago

I created GlobalDiffEq.jl with your starter code in there an enough to make it conform to the common interface, and opened a bunch of issues to essentially ready it for a GSoC project (or maybe even a class project...). I don't know if I'll get to much more right now, but it's at least useful and has a good roadmap.

SciML / DifferentialEquations.jl