SciML / OrdinaryDiffEq.jl

High performance ordinary differential equation (ODE) and differential-algebraic equation (DAE) solvers, including neural ordinary differential equations (neural ODEs) and scientific machine learning (SciML)
https://diffeq.sciml.ai/latest/
Other
533 stars 201 forks source link

More Runge-Kutta Methods for (Hyperbolic) PDEs #493

Open ranocha opened 6 years ago

ranocha commented 6 years ago

Here are some additional articles describing/investigating/developing Runge-Kutta methods, especially for hyperbolic PDEs:

These methods do not necessarily have high-priority but may be implemented. There are also many other methods.

ChrisRackauckas commented 6 years ago

Thanks! Since we have good infrustructure for Runge-Kutta methods I think it's best to leave this as an introductory issue (this is a good "first algorithm" for potential GSoC students!)

ChrisRackauckas commented 6 years ago

There's this one as well: https://arxiv.org/abs/1806.08693

ranocha commented 6 years ago

There's this one as well: https://arxiv.org/abs/1806.08693

I've added it to the list.

ChrisRackauckas commented 5 years ago

And a new one: https://arxiv.org/abs/1809.04807

ranocha commented 5 years ago

And a new one: https://arxiv.org/abs/1809.04807

Thanks. I've added that as well to the list.

dgan181 commented 5 years ago

I have tried to implement the first method in the list given above in #581. I'd like to know if it is alright.

ranocha commented 5 years ago

Thanks, @dgan181. You implemented the method SSPRK53_2N1 of Higueras and Roldan as SSPRK53_2N, didn't you? It may be good to add also the other scheme SSPRK53_2N2 (eq. (60) of Higueras and Roldan).

It would be good to add the new citation for SSPRK53_2N (or SSPRK53_2N1 and SSPRK53_2N2 if both are added) to https://github.com/JuliaDiffEq/juliadiffeq.github.io/blob/master/citing.md.

dgan181 commented 5 years ago

@ranocha I tried implementing the second method here #584 . Once that is resolved, I will add the new citations.

arnav-t commented 5 years ago

I'm implementing RK46-NL from Berland, Bogey, Bailly (2006)

ChrisRackauckas commented 5 years ago

Merging the discussion of #415. @dgan181 is doing Conde, Fekete, Shadid (2018)

dgan181 commented 5 years ago

I am implementing Hadjimichael, Ketcheson, Lóczi, and Németh (2016)

keshavseksaria commented 5 years ago

Hello everyone, i'm a newbie, trying to implement Mead, Renaut (1999)

arnav-t commented 5 years ago

I guess I will switch to Stanescu, Habashi (1998) now as @saurabhkgp21 did RK46-NL

stroebel commented 5 years ago

I am working on Parsani, Ketcheson, Deconinck (2013)

deeepeshthakur commented 5 years ago

@ranocha These papers Kubatko, Yeager, Ketcheson (2014) and Kennedy, Carpenter, Lewis (2000) have 21 and >16 schemes respectively. We might not want to implement all of them. But I'm not able to decide which one to implement and which ones I should skip.

ranocha commented 5 years ago

For Kennedy, Carpenter, Lewis (2000), the method RK4(3)5[2R+]C (using their nomenclature) should be a good starting point. Since the low-storage schemes described there can be implemented similarly, it might be worth the effort to write some kind of "generic" version of the methods perform_step! etc. such that only the low-storage Runge-Kutta coefficients have to be supplied for a new scheme.

ranocha commented 5 years ago

I don't have a really good advice concerning Kubatko, Yeager, Ketcheson (2014). The present several schemes for different applications (order of the DG method the schemes are optimised for). It might seem that these schemes are not as popular in the DG community as others because of their higher storage requirements.

deeepeshthakur commented 5 years ago

For Kennedy, Carpenter, Lewis (2000), the method RK4(3)5[2R+]C (using their nomenclature) should be a good starting point. Since the low-storage schemes described there can be implemented similarly, it might be worth the effort to write some kind of "generic" version of the methods perform_step! etc. such that only the low-storage Runge-Kutta coefficients have to be supplied for a new scheme.

@ChrisRackauckas @ranocha Yes. This can be implemented quite easily coefficients can be stored as SVector in ConstantCache instead of being separate variables. The integrator argument will remain exactly the same and the Cache argument can be changed to the union of all methods using that perform_step. But won't this be a drastic change in how the other methods are implemented?

deeepeshthakur commented 5 years ago

I don't have a really good advice concerning Kubatko, Yeager, Ketcheson (2014). The present several schemes for different applications (order of the DG method the schemes are optimised for). It might seem that these schemes are not as popular in the DG community as others because of their higher storage requirements.

So which ones should we implement?

ranocha commented 5 years ago

If we follow the creators of Julia, we should implement all methods because "we are greedy". I think Chris mentioned sometime ago that he likes the idea of having many published time integration schemes collected in the DiffEq ecosystem. Thus, all schemes might be implemented in order to be able to test and compare them easily.

Whether or not the "change in how the other methods are implemented" is acceptable might be debatable. Most RK methods have specialised implementations because there are optimisation possibilities specific for each scheme. However, the kinds of low-storage schemes I know leave much less (or no) further optimisation possibilities, at least as far as I know. Thus, it might be worth having a general implementation for them. If we see further optimisation possibilities for a specific scheme, we can still write specialised methods.

@ChrisRackauckas should answer these questions and decide what to do, I think.

ChrisRackauckas commented 5 years ago

Whether or not the "change in how the other methods are implemented" is acceptable might be debatable. Most RK methods have specialised implementations because there are optimisation possibilities specific for each scheme. However, the kinds of low-storage schemes I know leave much less (or no) urther optimisation possibilities, at least as far as I know. Thus, it might be worth having a general implementation for them. If we see further optimisation possibilities for a specific scheme, we can still write specialised methods.

The problem is a complicated mixture of history and GPU interactions. Essentially, when I first created the RK methods, Julia didn't have StaticArrays.jl. I tested what happened if you replaced the Vectors of coefficients (defined in https://github.com/JuliaDiffEq/DiffEqDevTools.jl/blob/master/src/ode_tableaus.jl ) over to constant coefficients, and saw a pretty good speedup, so I implemented them like that. All of the first RK methods had completely different structures, so that's fine. And you don't want to condense the structure too much because you want to conserve single kernel broadcasting. What I mean by that is:

@. tmp = dt*a12*k2 + dt*a13*k3

is much faster than

@. tmp = dt*a12*k2 
@. tmp += dt*a13*k3

when GPUs or MPI or things like that are involved, since the former is a single kernel and the latter is two, so it reduces the parallelism overhead. So the standard RK methods, with their random zeros and different error estimators, are difficult to put together in a way that preserves these features, so they stayed written out.

But not all methods have. If you look at the Rodas4 perform step ( https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/perform_step/rosenbrock_perform_step.jl#L800-L946 ) you can see that multiple methods use that same perform step definition (https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/caches/rosenbrock_caches.jl#L305-L558) since there are no optimizations that can be done when separating them. These cache constructions can probably condensed ever more too, since the only difference is the coefficients they use.

So yes, the low-storage methods of the same type should be condensed. In fact, the low-storage methods kind of have a stepping archetype to them that can be exploited to put a lot of them together. I wasn't worrying about that for people just starting since I'd rather people learn how to write a method and then move onto their project rather than get frustrated with writing the best most generic low-storage setup ever, but someone needs to do this. Using static array coefficients and keeping the kernels full sized should have absolutely no impact on the runtime of the low-storage methods, so there's definitely some cleanup in order.

But you can see what what you do need is enough structure contained in the method's implementation to keep the benefits. If you just go "oh they are RK methods", that's too far because now you don't have a way to specialize the cache sizes. So @ranocha is right, "we are greedy". In fact, here we accept that OrdinaryDiffEq.jl is a library, so while aliasing a few arrays might make analysis more difficult ( https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/caches/low_order_rk_caches.jl#L446-L459 ), people are using our library expecting it to be performant so we need to make these optimizations. The reason why we specialize is because many times it's simply easier to get a performant method if you write each one out.

The way to think of library development in Julia is a two step process:

  1. Get it right (working, tested)
  2. Get it good (benchmarked, simplified, performant)

Trying to do (2) without doing (1) just slows you down. Many times it's good to just accept (1) and let everyone update it along the way later! In fact, doing this as two separate PRs is usually a great way to work.

ranocha commented 5 years ago

I wasn't worrying about that for people just starting since I'd rather people learn how to write a method and then move onto their project rather than get frustrated with writing the best most generic low-storage setup ever, but someone needs to do this. Using static array coefficients and keeping the kernels full sized should have absolutely no impact on the runtime of the low-storage methods, so there's definitely some cleanup in order.

I support that approach completely. While implementing one or two low-storage schemes is a good way to learn Julia, the DiffEq ecosystem, and more about Runge-Kutta methods, implementing lots of them can be really annoying. Thus, I've started some more "general" implementation in #648.

ChrisRackauckas commented 4 years ago

Mead, Renaut (1999) is being worked on in https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/pull/1052 Stanescu, Habashi (1998) is being worked on in https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/pull/1051

Biswajitghosh98 commented 4 years ago

@ranocha @ChrisRackauckas I tried to implement SSPRK53_H from the first paper, with 3N registers, in https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/pull/1055

Thank You !

kamalojasv181 commented 4 years ago

Implemented https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/pull/1051 .

kamalojasv181 commented 4 years ago

@ChrisRackauckas can I work on Stanescu, Habashi (1998) two step method ?

mtsokol commented 4 years ago

Hi, I would like to give it a try to implement one of the methods. Which one of the remaining is the easiest, where I can follow an existing pattern and learn the basics that way?

ChrisRackauckas commented 4 years ago

Ketcheson's is a good one to try. Follow the devdocs: https://devdocs.juliadiffeq.org/latest/contributing/adding_algorithms/

ErikQQY commented 3 years ago

Hello I am implementing KYK SSPRK52