Closed ChrisRackauckas closed 5 years ago
Most low-storage methods make some assumption on the possible form of assignments using only two arrays, cf. Ketcheson, David I. "Runge–Kutta methods with minimum storage implementations." Journal of Computational Physics 229.5 (2010): 1763-1773:
2N
methods (Williamson): Assignemnts of the form u2 = a*u2 + dt*f(u1)
(or u2 = a*u2 + b*dt*f(u1)
) can be made using only two arrays.2R
methods (van der Houwen): Assignments of the form u1 = f(u1)
can be made using only one array.2S
methods (Ketcheson): Assignemnts of the form u1 = a*u1 + b*u2 + dt*f(u1)
can be made using only two arrays.As far as I know, the current setup of OrdinaryDiffEq.jl
does not allow any of these assumptions, since function evaluations are of the form u2 = f(u1)
(possibly in-place, of course). Thus, we need one additional array compared to the memory requirements advertised in research articles.
For the user, it might be relatively easy to write the RHS f
such that the 2N
assumption is fulfilled. The other ones are more complicated in general.
We might want to support alternative AbstractODEProblems for these then.
If we really want to minimise the memory usage this seems to be the way to go. This can also speed up the solution of the new specialised problems, e.g. if a linear ODE is to be solved using mul!
for sparse arrays (cf. https://github.com/JuliaLang/julia/pull/26117) or something similar if https://github.com/JuliaLang/julia/pull/29634 is merged. An API might be similar to the five argument syntax of mul!
.
We may just want to add these as possibilities to ODEFunction. The whole point is to allow other forms of the function for optimizations there. The documentation will need some work for it though, and then the methods will need to handle the existence/non-existence of such optimized form.
I'm working on this and will soon push a PR.
Another possibility to reduce the memory would be to set uprev === u
for some methods. In the current setup, this does not seem to be possible, since uprev
is set in https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/ccd2287a2845a093638875762b74d4fce0bf14eb/src/solve.jl#L213
@ChrisRackauckas: Could it be safe if the algorithm/cache decided whether uprev
is a new array or not?
@ChrisRackauckas: Could it be safe if the algorithm/cache decided whether
uprev
is a new array or not?
uprev===u
breaks interpolation, so that is only safe when calck==false
. Think of calck
as meaning that we know the user only wants the solving to work and no interpolation.
http://docs.juliadiffeq.org/latest/basics/common_solver_opts.html#Miscellaneous-1
It's set by:
https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/solve.jl#L27-L29
So note that if you have a saveat
without a tstop
there, you will be using interpolations. But for that size of PDEs, you probably can't save any intermediate values anyways? If you can, you can drop the dt for a step.
Yes, I would not save intermediate values for big PDEs (or would control that explicitly). There are some algorithms that use uprev
and some that do not need it. We could reduce the size if the algorithm could tell that. Maybe something like alies_uprev(alg)
which defaults to false
and can be set to true if this is safe? Or move the initialisation of uprev
to the cache
?
Or move the initialisation of
uprev
to thecache
?
We really should do that. It's just waiting to be done, look at the proximity:
https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/solve.jl#L213-L220
Well actually, why not just universally do calck ? uprev = recursivecopy(u) : uprev = u
?
We could do that. But there are some methods that use uprev
explicitly, e.g. https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/ccd2287a2845a093638875762b74d4fce0bf14eb/src/perform_step/low_storage_rk_perform_step.jl#L118
We would have to adapt them, but that's okay for me.
Yeah thinking about it more, most RK methods need uprev in the b
step at the end, so that won't work. uprev
is also needed when adaptive since it's required for step rejections. It's just the low-storage methods. So yeah, it would make sense moving it into the cache for the algorithms to take control of how that's done.
I would leave that task open for another PR and just submit some first improvements and checks soon.
Yup that sounds good.
If we move the initialisation of uprev
to the cache
, we should do the same with uprev2
, don't we? In that case, we can either forward the keyword argument allow_extrapolation
from
https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/7dd04a406c9097f14542317684d26d2ddf0e193f/src/solve.jl#L58
to the cache or let the cache decide what to do. Which possibility do you prefer, @ChrisRackauckas?
Thinking more about it, I do not really like the idea to more uprev
and uprev2
to the caches. Since are lots of methods which do not need at least one of them, the structure of the caches would not reflect theirs needs.
Short summary (since I don't have time to work on that part right now): After all the changes linked above, the remaining change is to add new forms of ODEFunction
s:
We may just want to add these as possibilities to ODEFunction. The whole point is to allow other forms of the function for optimizations there. The documentation will need some work for it though, and then the methods will need to handle the existence/non-existence of such optimized form.
I'd like to work on this issue if its not done yet.
According to my understanding, making an option rewrite_u
in ODEProblem
would do our work. For eg. if the function is f(u, p, t) = A*u
where A is an upper triangular matrix. In this case u
is rewritable. If rewrite_u
is set, we could alias du
with u
to carry out the function evaluations of form u1 = f(u1)
. Similarly all the three forms could be incorporated using this option.
There is some current discussion on the matrix multiplication API. Hopefully, some decision on the API of inplace multiplication of the form we need for 2N
methods will be made soon, cf.
https://github.com/JuliaLang/julia/issues/23919#issuecomment-426906393.
According to my understanding, making an option rewrite_u in ODEProblem would do our work. For eg. if the function is f(u, p, t) = A*u where A is an upper triangular matrix. In this case u is rewritable. If rewrite_u is set, we could alias du with u to carry out the function evaluations of form u1 = f(u1). Similarly all the three forms could be incorporated using this option.
I think this needs to be in-method. If a specific method can make good reuse of u
, it should, and then it can likely delete a cache. Of course, not all can, and it can only be done when units aren't used.
I don't think the in-place matrix multiplication API is necessary to create the right ODEFunction infrastructure, but for a user to actually define the function without allocations it'll be needed.
but for a user to actually define the function without allocations it'll be needed.
That's what I meant. We might want to use an API similar to the final decision for the matrix multiplication API. If that's mul!(C, A, B, α=1, β=0)
for C = α*A*B + β*C
, we can use f!(du, u, p, t, α, β)
for du = α*f(u, p, t) + β*du
, needed for 2N
methods.
Yes it would be dependent on both the problem and the algorithm. There may be a function where s = f(s)
is not possible using a single register. So it should be specified by user that this transformation is possible for this function.
If this is possible, then the method (infact it may be possible to use this extra register in any method) may exploit this fact. I will make a PR which implements this to show the example.
Oh I see where you're going with this. Yeah, this would be a neat solution without requiring additional function interfaces. Great idea!
@ranocha tasked us at first with fixing memory usage in https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/issues/178 . We now have all of the pieces in place to do it correctly (see https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/issues/178#issuecomment-460039918 ). What remains to be done are some
calck
optimizations on the low-memory methods, and testing to make sure it works out.How to do it is already described, so let's talk about testing. Memory tracking makes sense in the asymptotic regime where
u0
is large. Whenu0
is large, say 100,000, then, since the size of the integrator is constant, the size of theuType
andrateType
vectors will completely dominate. In that case, we can measure the integrator's size in order to see how many registers we are actually using. It's quite accurate:and understands aliasing:
So with this, our goal should be to get these methods down to exactly the amount they require. Some things to note are:
solve.jl
, we copyu0
. We should add a new argument to the common interface forcopy_start = true
. True for safety, since if you changeu0
all sorts of wonky things will happen to people's code. But true performance purists may want us to re-use that array, so let's do it.calck
is false, don't need thek
values. They can make the algorithm work out just by aliasing those to another array instead. Note that these are crucial to common use though, sincesaveat
is a widely used thing, so we should try to see if we can do this in a way that eliminates allocations even whencalck=true
(which is thesaveat
case)