Minimizing and Testing Memory Usage in Low-Memory RK Methods

ChrisRackauckas commented 5 years ago

@ranocha tasked us at first with fixing memory usage in https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/issues/178 . We now have all of the pieces in place to do it correctly (see https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/issues/178#issuecomment-460039918 ). What remains to be done are some calck optimizations on the low-memory methods, and testing to make sure it works out.

How to do it is already described, so let's talk about testing. Memory tracking makes sense in the asymptotic regime where u0 is large. When u0 is large, say 100,000, then, since the size of the integrator is constant, the size of the uType and rateType vectors will completely dominate. In that case, we can measure the integrator's size in order to see how many registers we are actually using. It's quite accurate:

using OrdinaryDiffEq
u0 = rand(1000,1000)
prob = ODEProblem((du,u,p,t)-> du .= u,u0,(0.0,1.0))
integ = init(prob,Tsit5(),save_start=false,save_end=false,
                          save_everystep = false) # this sets dense = calck = false
Base.summarysize(integ)/Base.summarysize(u0) # 13.00020737396313

and understands aliasing:

Base.summarysize([u0 for i in 1:7]) / Base.summarysize(u0) # 1.0000119999400003

So with this, our goal should be to get these methods down to exactly the amount they require. Some things to note are:

When we first start out in solve.jl, we copy u0. We should add a new argument to the common interface for copy_start = true. True for safety, since if you change u0 all sorts of wonky things will happen to people's code. But true performance purists may want us to re-use that array, so let's do it.
Low-memory RK methods, when calck is false, don't need the k values. They can make the algorithm work out just by aliasing those to another array instead. Note that these are crucial to common use though, since saveat is a widely used thing, so we should try to see if we can do this in a way that eliminates allocations even when calck=true (which is the saveat case)
Low-memory RK methods papers say how many registers it should be using, so we better be matching that integer. If we aren't, we should dig in and find out why.

ranocha commented 5 years ago

Most low-storage methods make some assumption on the possible form of assignments using only two arrays, cf. Ketcheson, David I. "Runge–Kutta methods with minimum storage implementations." Journal of Computational Physics 229.5 (2010): 1763-1773:

2N methods (Williamson): Assignemnts of the form u2 = a*u2 + dt*f(u1) (or u2 = a*u2 + b*dt*f(u1)) can be made using only two arrays.
2R methods (van der Houwen): Assignments of the form u1 = f(u1) can be made using only one array.
2S methods (Ketcheson): Assignemnts of the form u1 = a*u1 + b*u2 + dt*f(u1) can be made using only two arrays.

As far as I know, the current setup of OrdinaryDiffEq.jl does not allow any of these assumptions, since function evaluations are of the form u2 = f(u1) (possibly in-place, of course). Thus, we need one additional array compared to the memory requirements advertised in research articles.

For the user, it might be relatively easy to write the RHS f such that the 2N assumption is fulfilled. The other ones are more complicated in general.

ChrisRackauckas commented 5 years ago

We might want to support alternative AbstractODEProblems for these then.

ranocha commented 5 years ago

If we really want to minimise the memory usage this seems to be the way to go. This can also speed up the solution of the new specialised problems, e.g. if a linear ODE is to be solved using mul! for sparse arrays (cf. https://github.com/JuliaLang/julia/pull/26117) or something similar if https://github.com/JuliaLang/julia/pull/29634 is merged. An API might be similar to the five argument syntax of mul!.

ChrisRackauckas commented 5 years ago

We may just want to add these as possibilities to ODEFunction. The whole point is to allow other forms of the function for optimizations there. The documentation will need some work for it though, and then the methods will need to handle the existence/non-existence of such optimized form.

ranocha commented 5 years ago

I'm working on this and will soon push a PR.

Another possibility to reduce the memory would be to set uprev === u for some methods. In the current setup, this does not seem to be possible, since uprev is set in https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/ccd2287a2845a093638875762b74d4fce0bf14eb/src/solve.jl#L213 @ChrisRackauckas: Could it be safe if the algorithm/cache decided whether uprev is a new array or not?

ChrisRackauckas commented 5 years ago

@ChrisRackauckas: Could it be safe if the algorithm/cache decided whether uprev is a new array or not?

uprev===u breaks interpolation, so that is only safe when calck==false. Think of calck as meaning that we know the user only wants the solving to work and no interpolation.

http://docs.juliadiffeq.org/latest/basics/common_solver_opts.html#Miscellaneous-1

It's set by:

https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/solve.jl#L27-L29

So note that if you have a saveat without a tstop there, you will be using interpolations. But for that size of PDEs, you probably can't save any intermediate values anyways? If you can, you can drop the dt for a step.

ranocha commented 5 years ago

Yes, I would not save intermediate values for big PDEs (or would control that explicitly). There are some algorithms that use uprev and some that do not need it. We could reduce the size if the algorithm could tell that. Maybe something like alies_uprev(alg) which defaults to false and can be set to true if this is safe? Or move the initialisation of uprevto the cache?

ChrisRackauckas commented 5 years ago

Or move the initialisation of uprevto the cache?

We really should do that. It's just waiting to be done, look at the proximity:

https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/solve.jl#L213-L220

Well actually, why not just universally do calck ? uprev = recursivecopy(u) : uprev = u?

ranocha commented 5 years ago

We could do that. But there are some methods that use uprev explicitly, e.g. https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/ccd2287a2845a093638875762b74d4fce0bf14eb/src/perform_step/low_storage_rk_perform_step.jl#L118 We would have to adapt them, but that's okay for me.

ChrisRackauckas commented 5 years ago

Yeah thinking about it more, most RK methods need uprev in the b step at the end, so that won't work. uprev is also needed when adaptive since it's required for step rejections. It's just the low-storage methods. So yeah, it would make sense moving it into the cache for the algorithms to take control of how that's done.

ranocha commented 5 years ago

I would leave that task open for another PR and just submit some first improvements and checks soon.

ChrisRackauckas commented 5 years ago

Yup that sounds good.

ranocha commented 5 years ago

If we move the initialisation of uprev to the cache, we should do the same with uprev2, don't we? In that case, we can either forward the keyword argument allow_extrapolation from https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/7dd04a406c9097f14542317684d26d2ddf0e193f/src/solve.jl#L58 to the cache or let the cache decide what to do. Which possibility do you prefer, @ChrisRackauckas?

ranocha commented 5 years ago

Thinking more about it, I do not really like the idea to more uprev and uprev2 to the caches. Since are lots of methods which do not need at least one of them, the structure of the caches would not reflect theirs needs.

ranocha commented 5 years ago

Short summary (since I don't have time to work on that part right now): After all the changes linked above, the remaining change is to add new forms of ODEFunctions:

We may just want to add these as possibilities to ODEFunction. The whole point is to allow other forms of the function for optimizations there. The documentation will need some work for it though, and then the methods will need to handle the existence/non-existence of such optimized form.

kanav99 commented 5 years ago

I'd like to work on this issue if its not done yet. According to my understanding, making an option rewrite_u in ODEProblem would do our work. For eg. if the function is f(u, p, t) = A*u where A is an upper triangular matrix. In this case u is rewritable. If rewrite_u is set, we could alias du with u to carry out the function evaluations of form u1 = f(u1). Similarly all the three forms could be incorporated using this option.

ranocha commented 5 years ago

There is some current discussion on the matrix multiplication API. Hopefully, some decision on the API of inplace multiplication of the form we need for 2N methods will be made soon, cf. https://github.com/JuliaLang/julia/issues/23919#issuecomment-426906393.

ChrisRackauckas commented 5 years ago

According to my understanding, making an option rewrite_u in ODEProblem would do our work. For eg. if the function is f(u, p, t) = A*u where A is an upper triangular matrix. In this case u is rewritable. If rewrite_u is set, we could alias du with u to carry out the function evaluations of form u1 = f(u1). Similarly all the three forms could be incorporated using this option.

I think this needs to be in-method. If a specific method can make good reuse of u, it should, and then it can likely delete a cache. Of course, not all can, and it can only be done when units aren't used.

I don't think the in-place matrix multiplication API is necessary to create the right ODEFunction infrastructure, but for a user to actually define the function without allocations it'll be needed.

ranocha commented 5 years ago

but for a user to actually define the function without allocations it'll be needed.

That's what I meant. We might want to use an API similar to the final decision for the matrix multiplication API. If that's mul!(C, A, B, α=1, β=0) for C = α*A*B + β*C, we can use f!(du, u, p, t, α, β) for du = α*f(u, p, t) + β*du, needed for 2N methods.

kanav99 commented 5 years ago

Yes it would be dependent on both the problem and the algorithm. There may be a function where s = f(s) is not possible using a single register. So it should be specified by user that this transformation is possible for this function. If this is possible, then the method (infact it may be possible to use this extra register in any method) may exploit this fact. I will make a PR which implements this to show the example.

ChrisRackauckas commented 5 years ago

Oh I see where you're going with this. Yeah, this would be a neat solution without requiring additional function interfaces. Great idea!

SciML / OrdinaryDiffEq.jl

Minimizing and Testing Memory Usage in Low-Memory RK Methods #640