Open ranocha opened 6 years ago
Thanks! Since we have good infrustructure for Runge-Kutta methods I think it's best to leave this as an introductory issue (this is a good "first algorithm" for potential GSoC students!)
There's this one as well: https://arxiv.org/abs/1806.08693
There's this one as well: https://arxiv.org/abs/1806.08693
I've added it to the list.
And a new one: https://arxiv.org/abs/1809.04807
And a new one: https://arxiv.org/abs/1809.04807
Thanks. I've added that as well to the list.
I have tried to implement the first method in the list given above in #581. I'd like to know if it is alright.
Thanks, @dgan181. You implemented the method SSPRK53_2N1
of Higueras and Roldan as SSPRK53_2N
, didn't you? It may be good to add also the other scheme SSPRK53_2N2
(eq. (60) of Higueras and Roldan).
It would be good to add the new citation for SSPRK53_2N
(or SSPRK53_2N1
and SSPRK53_2N2
if both are added) to https://github.com/JuliaDiffEq/juliadiffeq.github.io/blob/master/citing.md.
@ranocha I tried implementing the second method here #584 . Once that is resolved, I will add the new citations.
I'm implementing RK46-NL from Berland, Bogey, Bailly (2006)
Merging the discussion of #415. @dgan181 is doing Conde, Fekete, Shadid (2018)
I am implementing Hadjimichael, Ketcheson, Lóczi, and Németh (2016)
Hello everyone, i'm a newbie, trying to implement Mead, Renaut (1999)
I guess I will switch to Stanescu, Habashi (1998) now as @saurabhkgp21 did RK46-NL
I am working on Parsani, Ketcheson, Deconinck (2013)
@ranocha These papers Kubatko, Yeager, Ketcheson (2014) and Kennedy, Carpenter, Lewis (2000) have 21 and >16 schemes respectively. We might not want to implement all of them. But I'm not able to decide which one to implement and which ones I should skip.
For Kennedy, Carpenter, Lewis (2000), the method RK4(3)5[2R+]C (using their nomenclature) should be a good starting point. Since the low-storage schemes described there can be implemented similarly, it might be worth the effort to write some kind of "generic" version of the methods perform_step!
etc. such that only the low-storage Runge-Kutta coefficients have to be supplied for a new scheme.
I don't have a really good advice concerning Kubatko, Yeager, Ketcheson (2014). The present several schemes for different applications (order of the DG method the schemes are optimised for). It might seem that these schemes are not as popular in the DG community as others because of their higher storage requirements.
For Kennedy, Carpenter, Lewis (2000), the method RK4(3)5[2R+]C (using their nomenclature) should be a good starting point. Since the low-storage schemes described there can be implemented similarly, it might be worth the effort to write some kind of "generic" version of the methods
perform_step!
etc. such that only the low-storage Runge-Kutta coefficients have to be supplied for a new scheme.
@ChrisRackauckas @ranocha Yes. This can be implemented quite easily coefficients can be stored as SVector
in ConstantCache
instead of being separate variables. The integrator argument will remain exactly the same and the Cache
argument can be changed to the union of all methods using that perform_step
. But won't this be a drastic change in how the other methods are implemented?
I don't have a really good advice concerning Kubatko, Yeager, Ketcheson (2014). The present several schemes for different applications (order of the DG method the schemes are optimised for). It might seem that these schemes are not as popular in the DG community as others because of their higher storage requirements.
So which ones should we implement?
If we follow the creators of Julia, we should implement all methods because "we are greedy". I think Chris mentioned sometime ago that he likes the idea of having many published time integration schemes collected in the DiffEq ecosystem. Thus, all schemes might be implemented in order to be able to test and compare them easily.
Whether or not the "change in how the other methods are implemented" is acceptable might be debatable. Most RK methods have specialised implementations because there are optimisation possibilities specific for each scheme. However, the kinds of low-storage schemes I know leave much less (or no) further optimisation possibilities, at least as far as I know. Thus, it might be worth having a general implementation for them. If we see further optimisation possibilities for a specific scheme, we can still write specialised methods.
@ChrisRackauckas should answer these questions and decide what to do, I think.
Whether or not the "change in how the other methods are implemented" is acceptable might be debatable. Most RK methods have specialised implementations because there are optimisation possibilities specific for each scheme. However, the kinds of low-storage schemes I know leave much less (or no) urther optimisation possibilities, at least as far as I know. Thus, it might be worth having a general implementation for them. If we see further optimisation possibilities for a specific scheme, we can still write specialised methods.
The problem is a complicated mixture of history and GPU interactions. Essentially, when I first created the RK methods, Julia didn't have StaticArrays.jl. I tested what happened if you replaced the Vectors of coefficients (defined in https://github.com/JuliaDiffEq/DiffEqDevTools.jl/blob/master/src/ode_tableaus.jl ) over to constant coefficients, and saw a pretty good speedup, so I implemented them like that. All of the first RK methods had completely different structures, so that's fine. And you don't want to condense the structure too much because you want to conserve single kernel broadcasting. What I mean by that is:
@. tmp = dt*a12*k2 + dt*a13*k3
is much faster than
@. tmp = dt*a12*k2
@. tmp += dt*a13*k3
when GPUs or MPI or things like that are involved, since the former is a single kernel and the latter is two, so it reduces the parallelism overhead. So the standard RK methods, with their random zeros and different error estimators, are difficult to put together in a way that preserves these features, so they stayed written out.
But not all methods have. If you look at the Rodas4 perform step ( https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/perform_step/rosenbrock_perform_step.jl#L800-L946 ) you can see that multiple methods use that same perform step definition (https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/caches/rosenbrock_caches.jl#L305-L558) since there are no optimizations that can be done when separating them. These cache constructions can probably condensed ever more too, since the only difference is the coefficients they use.
So yes, the low-storage methods of the same type should be condensed. In fact, the low-storage methods kind of have a stepping archetype to them that can be exploited to put a lot of them together. I wasn't worrying about that for people just starting since I'd rather people learn how to write a method and then move onto their project rather than get frustrated with writing the best most generic low-storage setup ever, but someone needs to do this. Using static array coefficients and keeping the kernels full sized should have absolutely no impact on the runtime of the low-storage methods, so there's definitely some cleanup in order.
But you can see what what you do need is enough structure contained in the method's implementation to keep the benefits. If you just go "oh they are RK methods", that's too far because now you don't have a way to specialize the cache sizes. So @ranocha is right, "we are greedy". In fact, here we accept that OrdinaryDiffEq.jl is a library, so while aliasing a few arrays might make analysis more difficult ( https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/caches/low_order_rk_caches.jl#L446-L459 ), people are using our library expecting it to be performant so we need to make these optimizations. The reason why we specialize is because many times it's simply easier to get a performant method if you write each one out.
The way to think of library development in Julia is a two step process:
Trying to do (2) without doing (1) just slows you down. Many times it's good to just accept (1) and let everyone update it along the way later! In fact, doing this as two separate PRs is usually a great way to work.
I wasn't worrying about that for people just starting since I'd rather people learn how to write a method and then move onto their project rather than get frustrated with writing the best most generic low-storage setup ever, but someone needs to do this. Using static array coefficients and keeping the kernels full sized should have absolutely no impact on the runtime of the low-storage methods, so there's definitely some cleanup in order.
I support that approach completely. While implementing one or two low-storage schemes is a good way to learn Julia, the DiffEq ecosystem, and more about Runge-Kutta methods, implementing lots of them can be really annoying. Thus, I've started some more "general" implementation in #648.
Mead, Renaut (1999) is being worked on in https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/pull/1052 Stanescu, Habashi (1998) is being worked on in https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/pull/1051
@ranocha @ChrisRackauckas I tried to implement SSPRK53_H from the first paper, with 3N registers, in https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/pull/1055
Thank You !
@ChrisRackauckas can I work on Stanescu, Habashi (1998) two step method ?
Hi, I would like to give it a try to implement one of the methods. Which one of the remaining is the easiest, where I can follow an existing pattern and learn the basics that way?
Ketcheson's is a good one to try. Follow the devdocs: https://devdocs.juliadiffeq.org/latest/contributing/adding_algorithms/
Hello I am implementing KYK SSPRK52
Here are some additional articles describing/investigating/developing Runge-Kutta methods, especially for hyperbolic PDEs:
These methods do not necessarily have high-priority but may be implemented. There are also many other methods.