Refactoring implementations of solver types to use tableau-based forms

YingboMa commented 6 years ago

Currently, the code in src/integrators is very repetitive and I think it can be cleaned up by some metaprogramming techniques. One concern that @ChrisRackauckas has about metaprogramming in the integrators is that it cannot handle zeros in tableaux automatically because the tableaux are not visible at parse time as they are in a struct. I think we can avoid this problem by just using matrices to store the tableaux, so the compiler can perform constant folding. Also, we can implement fast tableau methods from this. Here is an example.

T = T2 = Float64
a21 = T(0.5)
a32 = T(0.75)
a41 = T(0.2222222222222222)
a42 = T(0.3333333333333333)
a43 = T(0.4444444444444444)
c1 = T2(0.5)
c2 = T2(0.75)
btilde1 = T(0.06944444444444445)
btilde2 = T(-0.08333333333333333)
btilde3 = T(-0.1111111111111111)
btilde4 = T(0.125)

A = [
 0    0    0    0
 a21  0    0    0
 0    a32  0    0
 a41  0    a43  0
]
C = [ c1, c2, 0 ]
B = [ btilde1, btilde2, btilde3, btilde4 ]
macro p(ex::Symbol, expr::Expr)
    Expr(:call, :push!, :($ex.args), expr)
end

tight_loop(ex) = Expr(:block,
                      Expr(:macrocall, Symbol("@tight_loop_macros"), LineNumberNode,
                           Expr(:for, :(i = uidx),
                                Expr(:block))))

applyA(n) = Expr(:call, :+, :(uprev[i]),
                 Expr(:call, :*, :dt,
                      Expr(:call, :+, [:($(A[n+1, j]) * $(Symbol(:k, j))[i]) for j in 1:n]...)))

#function perform_step!(integrator, tab, cache, repeat_step, Val{true})
    #A, B, C = tab
    body = Expr(:macrocall, Symbol("@inbounds"), LineNumberNode,
                Expr(:macrocall, Symbol("@muladd"), LineNumberNode,
                     Expr(:block)))
    ex   = body.args[end].args[end]
    @p ex Expr(:macrocall, Symbol("@unpack"), LineNumberNode, :((t, dt, uprev, u, f) = integrator))
    @p ex :( k1 = integrator.fsalfirst )
    @p ex :( uidx = eachindex(uprev) )
    @p ex :( tmp = similar(uprev) )
    @p ex :( a = dt*$(A[2,1]) )
    @p ex :( $(tight_loop(:(tmp[i] = uprev[i]+a*k1[i]))) )
    len = size(A, 1)
    for i in 2:len-1
        @p ex :( $(Symbol(:k,i)) = f(t+$(C[i-1])*dt,tmp) )
        # NOTE: This has to be done since u is aliased with uprev.
        i == len && @p ex :(utmp = similar(u))
        applya = applyA(i)
        tmp = i == len ? :(utmp) : :(tmp)
        @p ex :( $(tight_loop(:($tmp[i] = $applya))) )
        #tmp = :( @. uprev+$(Symbol(a, i))*$(Symbol(k, i)) )
        #@p ex :( Symbol(k, i) = f(t+$(C[i-1])*dt, $tmp) )
    end
    @p ex :( u = utmp )
    @p ex :( $(Symbol(:k,len)) = f(t+dt,u) )
    @p ex :( integrator.fsallast = $(Symbol(:k,len)) )
    body
#end

The above code can generate the expression

:(@inbounds @muladd(begin
              @unpack (t, dt, uprev, u, f) = integrator
              k1 = integrator.fsalfirst
              uidx = eachindex(uprev)
              tmp = similar(uprev)
              a = dt * 0.5
              begin
                  @tight_loop_macros for i = uidx
                      end
              end
              k2 = f(t + 0.5dt, tmp)
              begin
                  @tight_loop_macros for i = uidx
                      end
              end
              k3 = f(t + 0.75dt, tmp)
              begin
                  @tight_loop_macros for i = uidx
                      end
              end
              u = utmp
              k4 = f(t + dt, u)
              integrator.fsallast = k4
          end))

ChrisRackauckas commented 6 years ago

Instead of matrices we should just use static arrays a la https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/issues/114

But the issue here is that you can do the simple case, but I am not sure this can actually be so simple in the full case. For example, we want to use broadcasting to keep compatibility with GPUs when possible (http://www.stochasticlifestyle.com/solving-systems-stochastic-pdes-using-gpus-julia/), but then after a certain line size broadcasting isn't possible (https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/issues/106). We will want those lines to fuse as well instead of a loop because that would require less GPU kernel calls (or whatever other form of parallelism by the array type).

Another issue is, how do you deal with the fact that some of the methods have to use non-standard error estimators?

The codes that do fall under the same category are actually quite small. I think the largest category this can hit (without multiple macros of course) is the explicit Runge-Kutta methods. The nicest thing to hit would be the SDIRK methods, but each of them have different stage predictors (along with things like KenCarp getting additive parts) so while there is repetitive code there is unique parts in most of them.

If you can find a good and easy to maintain way to implement this I will accept it, but those are the constraints and it's not an easy problem to satisfy those except maybe in the explicit RK and symplectic RK cases.

ChrisRackauckas commented 6 months ago

There was a big push during a Hackathon to investigate this a bit more, see:

It was determined that metaprogramming isn't needed, but smart implementations of RK methods could do it fine. In particular, the hardest algorithms to get correct with this are the standard explicit Runge-Kutta algorithms because they can do certain optimizations when unrolled.

This kind of goes hand-in-hand with https://github.com/SciML/OrdinaryDiffEq.jl/issues/2177. Maybe we can keep the more optimized versions of the explicit RK methods around but as a subpackage or something.

However, at least the implicit methods should not see any benefit from any form of unrolling since the cost is dominated by the implicit solves. In that case, we should be looking to refactor Rosenbrock, SDIRK, etc. first.

ParamThakkar123 commented 3 months ago

@ChrisRackauckas , @YingboMa May I work on this issue ??

ParamThakkar123 commented 3 months ago

or is it done ??

ChrisRackauckas commented 3 months ago

It has not been done. If you want to claim it, go for it. Send the email for the SciML Small Grants claim and let's get this started. I think Rosenbrock is the first set to do. Unify Rosenbrock4Cache and Rosenbrock5Cache first, the Rodas4, Rodas5, Rodas4P, Rodas5P, Rodas5Pr, etc. set should be the most similar, then grow that to all of the Rosenbrocks. Rosenbrock23 will be the odd one I think, it's fine to do a first merge with that separate.

ParamThakkar123 commented 3 months ago

It has not been done. If you want to claim it, go for it. Send the email for the SciML Small Grants claim and let's get this started. I think Rosenbrock is the first set to do. Unify Rosenbrock4Cache and Rosenbrock5Cache first, the Rodas4, Rodas5, Rodas4P, Rodas5P, Rodas5Pr, etc. set should be the most similar, then grow that to all of the Rosenbrocks. Rosenbrock23 will be the odd one I think, it's fine to do a first merge with that separate.

Sounds good. I'll do the needful and get started with this

oscardssmith commented 3 months ago

I have a PR coming soon that will make Rosenbrock32 less special as well (but it will take a little bit)

ParamThakkar123 commented 1 month ago

@ChrisRackauckas My contribution period under the small grants program is over but I want to continue to work on this issue and solve this for other solvers after we get this done with rosenbrock. If you allow, may I get an extension for this, please ??

SciML / OrdinaryDiffEq.jl

Refactoring implementations of solver types to use tableau-based forms #233