Closed mauro3 closed 9 years ago
Actually, this doesn't solve the problem that #60 addresses, since you've effectively converted all the tables to Float64
anyway in the constructors of TableauRKExplicit
(when saving the arrays). It's a good way on its way, though, and the approach is better than what I had in #60, so I'd gladly yield that for this! :)
Ah, yes, you're right. Another step is needed. But my idea was that the parameter T
of the tableau would determine what type is used in the solver. As it doesn't make any sense to have the tabelau in a different type to the numerics. So, we'd need to write it:
bt_rk45_coef = ( [ 0 0 0 0 0 0
1//4 0 0 0 0 0
3//32 9//32 0 0 0 0
1932//2197 -7200//2197 7296//2197 0 0 0
439//216 -8 3680//513 -845//4104 0 0
-8//27 2 -3544//2565 1859//4104 -11//40 0 ],
[25//216 0 1408//2565 2197//4104 -1//5 0
16//135 0 6656//12825 28561//56430 -9//50 2//55],
[0, 1//4, 3//8, 12//13, 1, 1//2])
bt_rk45 = TableauRKExplicit(:fehlberg,(4,5),Float64, bt_rk45_coef...)
bt_rk45_f32 = TableauRKExplicit(:fehlberg,(4,5),Float32, bt_rk45_coef...)
The problem is, that the type of the solution is determined by the type of the time-step (most likely float), the type of the tableau coefficients and the type of F
(rhs of the ODE). All three are somehow multiplied to give the new solution. So I am not sure it makes much sense to store the coefficients as Rational ...
Nice work BTW!
My idea was that 99% of users will work with Float64. For the other 1%, they can instantiate something like bt_rk45_f32
where T==Float32
. As my solver is written, then initial conditions and time vector are, if necessary, converted to that T
(actually this should be tested). It would be up to the user to make sure that there is no conversion or no loss in the conversion. So, this is not fully automatic but can be extended if needed. Note that inside bt_rk45
, which is used by default, the coefficients are stored as Float64.
And thanks. But, I just had a look at performance with IVPtestSuite and it's atrocious, so still lots to be done...
The reason to store the coefficients as rational is that they are the least invasive when promoting with other number types. Then, we can do something conceptually equivalent to
TTime = eltype(tspan)
TRHS = typeof(F(0, #= whatever args we need =#))
TTableau = typeof(one(TTime) * one(TRHS) * one(Rational{Int})
and then (possibly) convert the coefficients.
This way, TTableau
will automatically be Float32
for a tspan
/F
combo which returns Float32
, remain Rational{Int}
for such inputs etc. This will mean one conversion per call to the solver, but we'll avoid all kinds of promotion slowdown in the inner loops.
Interpolations.jl
does very similar things to handle typing of coefficients gracefully.
Sorry for nitpicking but I am in the 1% not using Float64
. Is this new solver generic in the sense that it would be possible to do computations using BigFloat
without loosing precision on the way (via e.g. multiplications by explicit Float64
constants)? So far it kind of looks like Float64
is hard-wired to the coefficient tables now. I could definitely live with initial conversions at the expense slower solver startup.
@mauro3 The reason this won't work is that it's impossible for us to prepare in this way for the case where someone has a self-created number type which doesn't promote cleanly with Float64
- consider for example a type HumongousFloat
that has 1024 bits of precision instead of BigFloat
's 256. We can't create versions of these tables for such types beforehand, but we can convert to these types on invocation. Storing the coefficients as rationals ensures that it's possible to convert to any concievable number type without loss of precision.
@pwl I don't think the current version of this PR works like that - and it might turn out to be out of scope for this PR to make it work like that. However, I do think it's valuable enough that we should solve this at some point. We'll see where this leads to and take it from there, I guess.
However important generics and sensible promotion rules are, I think the main merit of this PR is to provide a white-room Runge-Kutta implementation, allowing us to greatly simplify the license of this package. That's no small thing, indeed! :)
I just wanted to ensure that the 1% is well represented. :) And I already have my own working fork so there is no hurry in implementing this feature. Don't get me wrong, I see this PR as a remarkable piece of work.
After fixing a bug in the step control, this now works. Here the performance tests from IVPTestSuites.jl:
The new solvers (*_v2
) about a factor 2 faster, memory footprint is also down by a factor 2-5 (not shown). However the slope of the old 78
solver is slightly shallower.
All solvers new solvers a factor 2-5 faster.
And for the fixed step solver:
The solver is significantly better than the old ones. This is a bit surprising as the ode4
and ode4_v2
should be the same alogrithm?!
@tlycken, yes I agree, that was the idea behind the little code example above, in bt_rk45_coef
are the rational coefficients. But maybe we should discuss this in another PR.
Very nice :-)
The solver is significantly better than the old ones. This is a bit surprising as the ode4 and ode4_v2 should be the same alogrithm?!
Well, in the old code many copies of the solution are made, maybe this explains the difference? Partly, this was necessary to make the code work with user-defined types. At the moment, y
doesn't have to be <:AbstractArray
for example. Is this possible with your code as well?
@acroy: we should add tests for non-abstractarrays if we want to support them. But what is the use case?
@acroy: and no, at the moment it doesn't work quite work for those general cases. It also stores the ys in a Matrix internally. I can update that.
Related, I find the current interface returning a Vector of something quite annoying. I can see that for the general case that is needed. But I always have to hcat(y...)
. Maybe return a Matrix for the normal case and only the Vector of Vectors otherwise?
@acroy: before updating the PR to allow for more general datatypes, a few questions:
What is the "interface" a typeof(y)
has to support? Looking through the code, it looks like just arithmetic. However, considering the speed and memory improvements provided by looping and indexing, it seems a shame to let those improvements slip. So, could I use indexing? Presumably it would be ok to use arrays for intermediate storage?
I agree that the typical use case will involve scalars or dense vectors, but there might be good reasons to have a user-defined type, which doesn't support the full array interface. For example, in JuliaQuantum/QuBase.jl we have QuArrays
which are not <:AbstractArray
. Also AFAIK @Jutho's tensors are not abstract arrays. Nevertheless, we definitely want to use ODE.jl to solve some differential equations.
Related, I find the current interface returning a Vector of something quite annoying.
Yeah, we discussed this when we put together the API specification (probably in #20). I think one point was, that one wants to use yout[end]
as new input or to evaluate F
. We could handle the array-input case separately, but this would also break the API consistency.
@mauro3: I think it is just norm
, +
, -
, multiplication with/division by scalars. At some point (#37) I came up with an example gist to demonstrate the basics. But I agree that looping and indexing is really helpful, so maybe we should revise the requirements?
I haven't done any profiling, but if memory management is the bottleneck here, we could very probably use the iterator versions (#49) to let the user take care of it in cases where they want to:
tout, yout = ode45(F, y0, tspan)
returning a Vector{typeof(y0)}
Vector
, so we don't duplicate codeIf we choose that path (and there is a clean-enough way to do it for the user, so we can show it off in examples), we can move that issue out of scope for this PR :)
I'm trying to adapt the solvers to the preferred API and internals but I find it quite painful and I'm wondering whether this in not a case of over-engineering? There are a few types involved:
Tt = typeof(tspan)
Ty = typeof(y0)
Tf = typeof(F(tspan(1),y0))
Et = eltype(tspan)
Ey = eltype(y0)
Ef = eltype(Tf)
E*
: There should be promotion of the eltypes (#26,): Tt should probably be some type of floating point number, for sure a Real (maybe FixedPoint, Rational could maybe work but as @stevengj said, it doesn't make sense), Ty should be some "continuous" Number, Tf also.
T*
: The T*
are either equal to the eltypes, if scalar, or some kind of container type. In the current solvers Tt is discarded, the outputs are kept in a Vector{Ty}
(this is where I experience my pain) and it is assumed Tf==Ty
(and thus Ef==Ey
).
Anyway, I wonder whether there is any need to support other containers than AbstractArrays? And in particular whether internal, temporary storage arrays can be just Array
s or not. If not, what is the use case?
The reason I am wondering these things is that I would like to see an API developed, similar to Sundials, where all storage can be preallocated and functions do in-place updated. I don't see how this is compatible with above internals. I don't think an iterator method would work, at least not if it is not in-place.
Sorry for the rant. I'll ponder this a bit longer...
The assumption that Tf==Ty
is sensible, we always need as many equations as there are variables so it should be always possible to map the return value Tf
back into Ty
. As for Tt
I think it should be an (abstract) vector of Real type (interpreted as either [start,stop]
or a sequence of predetermined output times). Can you see any other uses for Tt
?
On the other hand the in-place updates are a must have feature if we want good performance in the iterator version and I don't see how this could be realized with custom types of Ty
and Tf
. Maybe by defining an indexing methods on Ty
and preallocating instances of Ty
as "working arrays"? Would that be feasible? Can we simply go with Arrays for this PR and think of this as a separate feature (as @tlycken suggests) or does it have to be a part of this PR for some technical reasons?
And just a comment, in implicit methods Ty
can only be an (abstract) vector because we have to solve an implicit equation for the next step via Newton iterations, which require a Jacobian for the whole system to be a 2d array. In principle it might be possible to prepare a specialized array type for each call but I guess it won't be very efficient.
My opinion is that types other than vectors would be very impractical to work with inside ODE.jl for reasons already mentioned and I think we should refrain from implementing API this way. Anyway, with iterators in place one can easily produce wrappers to generate user defined types from the array returned at each step via imap
for example, same goes for the wrapper around F
(I am doing this in my code and it works well enough). If somebody wants to define a specialized type he can also create these wrappers without much effort.
I haven't followed this discussion, but the tensors in TensorToolbox.jl (which is far from usable and is currently receiving little attention) are indeed not AbstractArray
subtypes. However, they are easily transformed into a Vector
and back, which I think is the only way they can be used together with e.g. iterative solvers, eigensolvers, ode integrators, etc... I think of these tensors as including both the coefficients and information about the basis vectors that span the vector space (probably very close to how QuBase.jl is working). The coefficients itself can easily be returned as a Vector
.
@Jutho I think, the normal use case involving TensorToolbox.jl would be to have a tensor field and time step that forward. So one tensor would correspond to a "Number" and the field would be contained in a Array-like container (i.e. the tensor would be Ey
, in above parlance). Is that correct, or is there a tensor-field type as well?
@acroy Whereas, presumably, a QuArray would itself be the container (Ty
), right?
@pwl: couldn't a jacobian be more abstract than a matrix?
If, as @pwl suggests, a Array-version should be merged that would be commit 8ac21e25254fbd28537ffd4fd0a160da28bd6704. Let me know if I should revert to that. Otherwise, I'm not sure that I want to try and update the PR. At least not until a more satisfying solution to above points has been found.
@mauro3 : That is right, QuArray
is basically the container in that case. Of course we can also convert this to a vector and back if necessary.
Maybe I am the only one who wants to keep the support for more general types via duck typing... I can see that this might be painful, since things like +=
make copies and we don't have canonical in-place operations like add!
yet. However, it seems that circumventing this limitation/bottleneck is the only reason for restricting the input to arrays. (I wish I could give a killer-example, but I don't have one. I just think there is no fundamental reason why the user has to represent his/her system of equations by an array, which in the end might be the most efficient way, but that is a different problem).
I hope this didn't sound too harsh, it certainly isn't meant to be. If everyone is in favor of switching to arrays, I am fine with that.
I am also strongly in favor of keeping the typing as open as possible via duck typing (although I don't necessarily think it has to be done in this PR).
My reasoning is this:
When it comes to "problems" like +=
making copies, I think this is really something we should leave to base Julia to optimize. Given the tremendous impact a performance boost for such operations will give across the board, it's very unlikely that the core team won't have a solution in place well before Julia 1.0 is released.
If it turns out that there are users who really do need the extra performance gain right now, we could possibly write specialized methods for Array
inputs that have these optimizations, but I think it's premature to have only those, and disallow a lot of other potential use cases.
My proposal is the following (which is implemented in the last commit b8458df):
# For non-arrays y0, treat y0 as scalar
function oderk_fixed(fn, y0, tspan, btab::TableauRKExplicit)
fn_ = (t, y) -> fn(t, y[1])
t,y = oderk_fixed(fn_, [y0], tspan, btab)
return t,vcat(y...)
end
# For array-like y0 use this. Note that the y0<:AbstractArray could be relaxed to
# any indexing container with a Holy-trait
function oderk_fixed{N,S}(fn, y0::AbstractVector, tspan,
btab_::TableauRKExplicit{N,S})
Ar, Et, Ey, Ef, btab = make_consistent_types(fn, y0, tspan, btab_)
dof = length(y0)
tsteps = length(tspan)
ys = Ar(Ey, dof, tsteps)
ys[:,1] = y0
tspan = convert(Vector{Et}, tspan)
# work arrays:
ks = Ar(Ef, dof, S)
ytmp = Ar(Ey, dof)
for i=1:length(tspan)-1
dt = tspan[i+1]-tspan[i]
ys[:,i+1] = ys[:,i] # this also allocates a bit...
for s=1:S
for d=1:dof
ytmp[d] = ys[d,i] # ytmp[:] = ys[:,i] allocates!
end
calc_next_k!(ks, ytmp, ytmp, s, fn, tspan[i], dt, dof, btab)
for d=1:dof
ys[d,i+1] += dt * btab.b[s]*ks[d,s]
end
end
end
return tspan, ys
end
I think this should work for arbitrary y0
which cannot be indexed (need to test this), and works fast for indexable y0
. Note that Ar
is a suitable container type which make_consistent_types
figures out.
I updated the code slightly (but still a bit hacky at the moment) and also @acroy cutstom-type example: https://gist.github.com/mauro3/d605bff8e11994a372d9 Now this works with both the adaptive stepper and fixed step. So, my approach seems feasible.
(Note that currently my ode54_v3 solver is about 3x slower than the ode45, but I'm sure it can be made faster)
My use case is: I want to use ode time steppers in the near future (next two month) to time-step discretized PDEs with up to 10^5 degrees of freedom. (Currently I do this with Matlab and it works up to 2e4 DOFs). Whilst Julia may eventually make many of the bottlenecks go away, it will take some time to get there, certainly too long for me to wait. Also, for systems that size you want to work without allocations.
But I also see the need/use for "scalar" values, like in @acroy's gist. I think the approach I outlined above could serve both needs, albeit with a somewhat higher code complexity. Performance-wise it would probably favor non-scalar types but I think that is ok, as those are typically the large systems.
OK, I see.
How about actually having two versions of the solver, with the choice between them being made on the types? Then one can explicitly dispatch on AbstractVector
(or even Vector
, if it makes life easier) and use all the indexing and other optimizations in your current version, while the other can be completely duck typed and only require that the input types support vector field operations (+
, * <scalar>
and norm
), returning a Vector{typeof(y0)}
instead of something more performant, like the previous version. If/when the performance bottlenecks of vectorized array code are fixed in base Julia so that they become equally performant, the vector specific method can be removed.
And by the way, I think a huge thank you is in place for putting time and effort into this re-write. Reading back in this thread it sounds like you're getting a lot of resistance that you don't deserve, just because a few of us happen to be picky - I hope you don't take it too hard :)
No worries and thanks!
I think the current version of the PR does what you describe, except that it avoids code duplication by wrapping any non-array thing y0
as [y0]
and calling the array solver. Which then solves it as a single component vector system. The operations performed on a single component are always just the scalar math operations, so it just works. Does that make sense?
Ok, so I tidied everything up (but not quite ready). The last commit of the PR is now how I think the interface/internals should be. It works for array y0
s and scalar y0
as in this gist: https://gist.github.com/mauro3/d605bff8e11994a372d9
I'll write about what interface the scalars and arrays need to conform to later. I realize this is the 10'000th commit but maybe someone still has some more time to look at this...
Your solvers return Vector{typeof(y)}
and Matrix{eltype(y)}
for "scalar" and array input, right? This breaks the current behavior, which isn't a big problem but we have to change the other solvers (e.g. ode23s
) as well. I would suggest to either stick to the old behavior and change to Vector/Matrix in a separate PR for all solvers or to tag the current master as 0.2 and merge your PR afterwards. There are probably only a few user who will care, but we should try to keep the package consistent in some form.
Yes, I'm aware of this and still mulling over it. But for sure it should be merged in an orderly fashion. I could prepare another PR to first do the API change.
What I would like to have is the following input -> output
y0
-> Vector{typeof(y0)}
QuArray
) -> Vector{QuArray}
[1., 2.]'
) -> Matrix{Float64}
So, API of 1 & 2 are the same and are consistent with the current API. 3 is different and a slightly breaking change. However the corresponding internals should be:
y0
as [y0]
and calling (2)But I'm not sure yet how to implement it such that 2&3 can share the algorithmic part of the code.
Apart from the technical issues, would this be a good API & internals?
Sounds very reasonable to me.
I see a few possible problematic points here - that doesn't mean I think this is a bad idea, but I want us to both conceptually and actually in code before we merge changes like these.
I would much rather have the same api as before, with the optimizations in a specialization for the types for which they are applicable. I'd rather see some code duplication, since they won't actually be algorithmically equivalent anyway, than trying to fit everything into the optimized code with both shoehorn and vaseline...
Maybe we could put the performance critical in-place updates in separate functions and keep the "algorithmic part" general? What we need is basically something like add!(A,B,beta)
which does A + beta*B -> A and copy!
. For the former we can provide a specialized version for vectors (I think there was/is some discussion to have a function like that in Base, but I can't find it).
I tried this for oderk_fixed
in this gist (which is a bit hacky) and the performance seems to be comparable to Mauro's implementation. Maybe there is some caveat though... And of course I don't know how easy it is to do something similar for the adaptive solver.
Thanks for the feedback Tomas:
Vector{Vector{String}}
takes 0.15s on my laptop. I would expect any ODE solver would take at least 10x longer, probably much longer, to produce so many values. Probably worse is the double anonymous function.points=:specified
, otherwise it will be push!
.F
and norm
should act on the whole state, i.e. pseudo-scalar, irrespective of whether in scalar, vector or matrix mode. It is then up to the user to internally devectorize these functions if needed. So, I think the step-size control can stay the same (having said this, the step control in this PR needs more work to get to that).Thanks Alex for prototyping this! The problem I see with your approach is that it only works with scalar y0
which are <:Number
, otherwise the overloading of copy!
doesn't work. Plus the overloading pollutes the Base
functions with hacks. This could be rectified by using our own local versions of copy!
. I just tried to update your version, pasted below your gist, and my experience was that it was quite confusing trying to do vectors and scalars at the same time...
So, my proposal to continue this PR, is to implement points 1 & 2 and leave 3 away for now. Thus the API is completely unchanged. I'm pretty sure that 3 can eventually be added onto 2 by using ArrayViews.jl. The advantage of this, compared to Alex's prototype is reduced complexity, all it takes is:
function oderk_fixed(fn, y0, tspan, btab::TableauRKExplicit)
# Non-arrays y0 treat as scalar
fn_ = (t, y) -> [fn(t, y[1])]
t,y = oderk_fixed(fn_, [y0], tspan, btab)
return t, vcat_nosplat(y)
end
vcat_nosplat(y) = eltype(y[1])[el[1] for el in y]
, compared to 8 function definitions in Alex's prototype. If we find that scalar performance is indeed too bad, then let's revisit Alex's approach.
The problem I see with your approach is that it only works with scalar y0 which are <:Number, otherwise the overloading of
copy!
doesn't work. Plus the overloading pollutes the Base functions with hacks.
Yeah, the situation with copy!
and similar
is not really optimal (although one might argue that copy!
for Numbers and other bitstypes should be included in Base anyways). I think the nice aspect of my approach is, that the algorithm is separated from performance optimizations for specific types.
But I also think that we don't have to do this separation now; your proposal looks very reasonable to me. The biggest problem for pseudo-scalars is probably the additional anonymous function.
Just now I updated according to my last post. (Edit: to state the obvious ;-)
This is it, point 1 & 2 done. Probably needs some moving of things to other files, removal of old solvers.
I want to add some more test, for instance Alex's example: https://gist.github.com/mauro3/d605bff8e11994a372d9 And I want to put together the required interface by the container and the scalar.
Life caught up, I'll work on this again in ~10 days.
What is still missing before we can merge this? I might be able to do some coding this week.
Sorry for having left this so long and thanks @berceanu for offering help. I suspect that it is probably easiest if I finish this. However, code review would certainly be good. Also helpful would be if you could try out this branch on your problem and report back.
PR #70 adds a test for a scalar-like state.
I ran a performance test for the custom scalar type of #70 (with endt=2000*pi
) https://gist.github.com/mauro3/8560062fbd7629e1d4cd
| Fixed step | time | MB |
|------------+-------+-----|
| ode4 | 0.003 | 4.7 |
| ode4_v2 | 0.005 | 6.4 |
| | | |
| Adapt step | time | MB |
|------------+------+------|
| ode45_dp | 1.1 | 945 |
| ode54_v2 | 3.2 | 2024 |
| | | |
| ode78_fb | 0.85 | 839 |
| ode78_v2 | 1.65 | 1325 |
The _v2
are the new corresponding solvers. So the trick of wrapping a scalar-like thing in []
and run that way is indeed a bit inefficient. So, an approach as suggested by @acroy above might be better. (edited)
I am kind of surprised that the difference is that large ... On the other hand for array-like input this PR should be (much) faster, right? So the question is if we want everything now or if we are willing to accept that the less common case (?) is slower and address that in a separate PR?
@mauro3 : Could you provide some benchmarks for the number and array input cases?
I would suggest the following:
Does that make sense?
As promised in #61, here a RK implementation sans license restrictions. It is a bit longer than the current code but has some more features as well.
There is probably a bug lurking somewhere as it is less accurate than the current solvers.Extra features:
Also, this "solves" what was tried in PR #60: the method will run using the type of number used for the coefficients.Improvement in ODE.jl to have
hinit
returntdir
as well.TODO:
doc update(there is no doc yet)This touches on issue #9 and fixes #25 (although that was fixed already, no?).
Extra features, maybe in this PR, maybe later:
iterator protocolmaybe implement dense output in terms ofPolynomials.jl
, root finding.