RFC: new adaptive (and fixed step) Runge Kutta solver

mauro3 commented 9 years ago

As promised in #61, here a RK implementation sans license restrictions. It is a bit longer than the current code but has some more features as well. ~~There is probably a bug lurking somewhere as it is less accurate than the current solvers.~~

Extra features:

Tableaus are a datatype. This would allow dispatch on different methods. ~~Also, this "solves" what was tried in PR #60: the method will run using the type of number used for the coefficients.~~
dense output (using 3rd order polynomials)

Improvement in ODE.jl to have hinit return tdir as well.

TODO:

[X] figure out why less accurate than current solvers.
[X] run through IVPtestsuite.jl
[X] ~~doc update~~ (there is no doc yet)
[X] fix performance issues
[X] update to conform to internal of other ODE solvers
[x] remove old solvers
[X] squash commits

This touches on issue #9 and fixes #25 (although that was fixed already, no?).

Extra features, maybe in this PR, maybe later:

[X] Tableaus in terms of Rational` and automatic conversion to suitable type
~~iterator protocol~~
~~maybe implement dense output in terms of Polynomials.jl, root finding.~~

coveralls commented 9 years ago

Coverage decreased (-41.03%) to 55.72% when pulling 8de3ff7c1f806e4dfab472da73450f08ff537aa7 on mauro3:m3/rk into 162c881430f51797fb91607770f55df554a4df5f on JuliaLang:master.

coveralls commented 9 years ago

Coverage decreased (-6.76%) to 90.0% when pulling a97187865787143007f2181744ad002fba4d04dd on mauro3:m3/rk into 162c881430f51797fb91607770f55df554a4df5f on JuliaLang:master.

tomasaschan commented 9 years ago

Actually, this doesn't solve the problem that #60 addresses, since you've effectively converted all the tables to Float64 anyway in the constructors of TableauRKExplicit (when saving the arrays). It's a good way on its way, though, and the approach is better than what I had in #60, so I'd gladly yield that for this! :)

mauro3 commented 9 years ago

Ah, yes, you're right. Another step is needed. But my idea was that the parameter T of the tableau would determine what type is used in the solver. As it doesn't make any sense to have the tabelau in a different type to the numerics. So, we'd need to write it:

bt_rk45_coef = (          [  0           0           0            0         0     0
                             1//4        0           0            0         0     0
                             3//32       9//32       0            0         0     0
                          1932//2197 -7200//2197  7296//2197      0         0     0
                           439//216     -8        3680//513    -845//4104   0     0
                            -8//27       2       -3544//2565   1859//4104 -11//40 0 ],
                           [25//216      0        1408//2565   2197//4104  -1//5  0
                            16//135      0        6656//12825 28561//56430 -9//50 2//55],
                            [0,          1//4,       3//8,       12//13,    1,    1//2])

bt_rk45 = TableauRKExplicit(:fehlberg,(4,5),Float64, bt_rk45_coef...)
bt_rk45_f32 = TableauRKExplicit(:fehlberg,(4,5),Float32, bt_rk45_coef...)

coveralls commented 9 years ago

Coverage decreased (-6.76%) to 90.0% when pulling a97187865787143007f2181744ad002fba4d04dd on mauro3:m3/rk into 162c881430f51797fb91607770f55df554a4df5f on JuliaLang:master.

acroy commented 9 years ago

The problem is, that the type of the solution is determined by the type of the time-step (most likely float), the type of the tableau coefficients and the type of F (rhs of the ODE). All three are somehow multiplied to give the new solution. So I am not sure it makes much sense to store the coefficients as Rational ...

Nice work BTW!

coveralls commented 9 years ago

Coverage decreased (-7.06%) to 89.7% when pulling 51238537dc60029362fca83e4b96cb762944cd28 on mauro3:m3/rk into 162c881430f51797fb91607770f55df554a4df5f on JuliaLang:master.

mauro3 commented 9 years ago

My idea was that 99% of users will work with Float64. For the other 1%, they can instantiate something like bt_rk45_f32 where T==Float32. As my solver is written, then initial conditions and time vector are, if necessary, converted to that T (actually this should be tested). It would be up to the user to make sure that there is no conversion or no loss in the conversion. So, this is not fully automatic but can be extended if needed. Note that inside bt_rk45, which is used by default, the coefficients are stored as Float64.

And thanks. But, I just had a look at performance with IVPtestSuite and it's atrocious, so still lots to be done...

tomasaschan commented 9 years ago

The reason to store the coefficients as rational is that they are the least invasive when promoting with other number types. Then, we can do something conceptually equivalent to

TTime = eltype(tspan)
TRHS = typeof(F(0, #= whatever args we need =#))
TTableau = typeof(one(TTime) * one(TRHS) * one(Rational{Int})

and then (possibly) convert the coefficients.

This way, TTableau will automatically be Float32 for a tspan/F combo which returns Float32, remain Rational{Int} for such inputs etc. This will mean one conversion per call to the solver, but we'll avoid all kinds of promotion slowdown in the inner loops.

Interpolations.jl does very similar things to handle typing of coefficients gracefully.

pwl commented 9 years ago

Sorry for nitpicking but I am in the 1% not using Float64. Is this new solver generic in the sense that it would be possible to do computations using BigFloat without loosing precision on the way (via e.g. multiplications by explicit Float64 constants)? So far it kind of looks like Float64 is hard-wired to the coefficient tables now. I could definitely live with initial conversions at the expense slower solver startup.

tomasaschan commented 9 years ago

@mauro3 The reason this won't work is that it's impossible for us to prepare in this way for the case where someone has a self-created number type which doesn't promote cleanly with Float64 - consider for example a type HumongousFloat that has 1024 bits of precision instead of BigFloat's 256. We can't create versions of these tables for such types beforehand, but we can convert to these types on invocation. Storing the coefficients as rationals ensures that it's possible to convert to any concievable number type without loss of precision.

@pwl I don't think the current version of this PR works like that - and it might turn out to be out of scope for this PR to make it work like that. However, I do think it's valuable enough that we should solve this at some point. We'll see where this leads to and take it from there, I guess.

However important generics and sensible promotion rules are, I think the main merit of this PR is to provide a white-room Runge-Kutta implementation, allowing us to greatly simplify the license of this package. That's no small thing, indeed! :)

pwl commented 9 years ago

I just wanted to ensure that the 1% is well represented. :) And I already have my own working fork so there is no hurry in implementing this feature. Don't get me wrong, I see this PR as a remarkable piece of work.

mauro3 commented 9 years ago

After fixing a bug in the step control, this now works. Here the performance tests from IVPTestSuites.jl:

scd-vs-walltime-threebody

The new solvers (*_v2) about a factor 2 faster, memory footprint is also down by a factor 2-5 (not shown). However the slope of the old 78 solver is slightly shallower.

scd-vs-walltime-hires

All solvers new solvers a factor 2-5 faster.

And for the fixed step solver:

fixedstep-scd-vs-walltime-threebody

The solver is significantly better than the old ones. This is a bit surprising as the ode4 and ode4_v2 should be the same alogrithm?!

mauro3 commented 9 years ago

@tlycken, yes I agree, that was the idea behind the little code example above, in bt_rk45_coef are the rational coefficients. But maybe we should discuss this in another PR.

acroy commented 9 years ago

Very nice :-)

The solver is significantly better than the old ones. This is a bit surprising as the ode4 and ode4_v2 should be the same alogrithm?!

Well, in the old code many copies of the solution are made, maybe this explains the difference? Partly, this was necessary to make the code work with user-defined types. At the moment, y doesn't have to be <:AbstractArray for example. Is this possible with your code as well?

mauro3 commented 9 years ago

@acroy: we should add tests for non-abstractarrays if we want to support them. But what is the use case?

mauro3 commented 9 years ago

@acroy: and no, at the moment it doesn't work quite work for those general cases. It also stores the ys in a Matrix internally. I can update that.

Related, I find the current interface returning a Vector of something quite annoying. I can see that for the general case that is needed. But I always have to hcat(y...). Maybe return a Matrix for the normal case and only the Vector of Vectors otherwise?

mauro3 commented 9 years ago

@acroy: before updating the PR to allow for more general datatypes, a few questions:

What is the "interface" a typeof(y) has to support? Looking through the code, it looks like just arithmetic. However, considering the speed and memory improvements provided by looping and indexing, it seems a shame to let those improvements slip. So, could I use indexing? Presumably it would be ok to use arrays for intermediate storage?

acroy commented 9 years ago

I agree that the typical use case will involve scalars or dense vectors, but there might be good reasons to have a user-defined type, which doesn't support the full array interface. For example, in JuliaQuantum/QuBase.jl we have QuArrays which are not <:AbstractArray. Also AFAIK @Jutho's tensors are not abstract arrays. Nevertheless, we definitely want to use ODE.jl to solve some differential equations.

Related, I find the current interface returning a Vector of something quite annoying.

Yeah, we discussed this when we put together the API specification (probably in #20). I think one point was, that one wants to use yout[end] as new input or to evaluate F. We could handle the array-input case separately, but this would also break the API consistency.

acroy commented 9 years ago

@mauro3: I think it is just norm, +, -, multiplication with/division by scalars. At some point (#37) I came up with an example gist to demonstrate the basics. But I agree that looping and indexing is really helpful, so maybe we should revise the requirements?

tomasaschan commented 9 years ago

I haven't done any profiling, but if memory management is the bottleneck here, we could very probably use the iterator versions (#49) to let the user take care of it in cases where they want to:

We keep tout, yout = ode45(F, y0, tspan) returning a Vector{typeof(y0)}
If that is too slow for the user, they can pre-allocate space for their results in a matrix, and use the iterator interface to fill it.
Internally, we use the iterator interface to fill the Vector, so we don't duplicate code

If we choose that path (and there is a clean-enough way to do it for the user, so we can show it off in examples), we can move that issue out of scope for this PR :)

mauro3 commented 9 years ago

I'm trying to adapt the solvers to the preferred API and internals but I find it quite painful and I'm wondering whether this in not a case of over-engineering? There are a few types involved:

Tt = typeof(tspan)
Ty = typeof(y0)
Tf = typeof(F(tspan(1),y0))

Et = eltype(tspan)
Ey = eltype(y0)
Ef = eltype(Tf)

E*: There should be promotion of the eltypes (#26,): Tt should probably be some type of floating point number, for sure a Real (maybe FixedPoint, Rational could maybe work but as @stevengj said, it doesn't make sense), Ty should be some "continuous" Number, Tf also.

T*: The T* are either equal to the eltypes, if scalar, or some kind of container type. In the current solvers Tt is discarded, the outputs are kept in a Vector{Ty} (this is where I experience my pain) and it is assumed Tf==Ty (and thus Ef==Ey).

Anyway, I wonder whether there is any need to support other containers than AbstractArrays? And in particular whether internal, temporary storage arrays can be just Arrays or not. If not, what is the use case?

The reason I am wondering these things is that I would like to see an API developed, similar to Sundials, where all storage can be preallocated and functions do in-place updated. I don't see how this is compatible with above internals. I don't think an iterator method would work, at least not if it is not in-place.

Sorry for the rant. I'll ponder this a bit longer...

pwl commented 9 years ago

The assumption that Tf==Ty is sensible, we always need as many equations as there are variables so it should be always possible to map the return value Tf back into Ty. As for Tt I think it should be an (abstract) vector of Real type (interpreted as either [start,stop] or a sequence of predetermined output times). Can you see any other uses for Tt?

On the other hand the in-place updates are a must have feature if we want good performance in the iterator version and I don't see how this could be realized with custom types of Ty and Tf. Maybe by defining an indexing methods on Ty and preallocating instances of Ty as "working arrays"? Would that be feasible? Can we simply go with Arrays for this PR and think of this as a separate feature (as @tlycken suggests) or does it have to be a part of this PR for some technical reasons?

pwl commented 9 years ago

And just a comment, in implicit methods Ty can only be an (abstract) vector because we have to solve an implicit equation for the next step via Newton iterations, which require a Jacobian for the whole system to be a 2d array. In principle it might be possible to prepare a specialized array type for each call but I guess it won't be very efficient.

My opinion is that types other than vectors would be very impractical to work with inside ODE.jl for reasons already mentioned and I think we should refrain from implementing API this way. Anyway, with iterators in place one can easily produce wrappers to generate user defined types from the array returned at each step via imap for example, same goes for the wrapper around F (I am doing this in my code and it works well enough). If somebody wants to define a specialized type he can also create these wrappers without much effort.

Jutho commented 9 years ago

I haven't followed this discussion, but the tensors in TensorToolbox.jl (which is far from usable and is currently receiving little attention) are indeed not AbstractArray subtypes. However, they are easily transformed into a Vector and back, which I think is the only way they can be used together with e.g. iterative solvers, eigensolvers, ode integrators, etc... I think of these tensors as including both the coefficients and information about the basis vectors that span the vector space (probably very close to how QuBase.jl is working). The coefficients itself can easily be returned as a Vector.

mauro3 commented 9 years ago

@Jutho I think, the normal use case involving TensorToolbox.jl would be to have a tensor field and time step that forward. So one tensor would correspond to a "Number" and the field would be contained in a Array-like container (i.e. the tensor would be Ey, in above parlance). Is that correct, or is there a tensor-field type as well?

@acroy Whereas, presumably, a QuArray would itself be the container (Ty), right?

@pwl: couldn't a jacobian be more abstract than a matrix?

If, as @pwl suggests, a Array-version should be merged that would be commit 8ac21e25254fbd28537ffd4fd0a160da28bd6704. Let me know if I should revert to that. Otherwise, I'm not sure that I want to try and update the PR. At least not until a more satisfying solution to above points has been found.

acroy commented 9 years ago

@mauro3 : That is right, QuArray is basically the container in that case. Of course we can also convert this to a vector and back if necessary.

Maybe I am the only one who wants to keep the support for more general types via duck typing... I can see that this might be painful, since things like += make copies and we don't have canonical in-place operations like add! yet. However, it seems that circumventing this limitation/bottleneck is the only reason for restricting the input to arrays. (I wish I could give a killer-example, but I don't have one. I just think there is no fundamental reason why the user has to represent his/her system of equations by an array, which in the end might be the most efficient way, but that is a different problem).

I hope this didn't sound too harsh, it certainly isn't meant to be. If everyone is in favor of switching to arrays, I am fine with that.

tomasaschan commented 9 years ago

I am also strongly in favor of keeping the typing as open as possible via duck typing (although I don't necessarily think it has to be done in this PR).

My reasoning is this:

When it comes to "problems" like += making copies, I think this is really something we should leave to base Julia to optimize. Given the tremendous impact a performance boost for such operations will give across the board, it's very unlikely that the core team won't have a solution in place well before Julia 1.0 is released.

If it turns out that there are users who really do need the extra performance gain right now, we could possibly write specialized methods for Array inputs that have these optimizations, but I think it's premature to have only those, and disallow a lot of other potential use cases.

mauro3 commented 9 years ago

My proposal is the following (which is implemented in the last commit b8458df):

# For non-arrays y0, treat y0 as scalar
function oderk_fixed(fn, y0, tspan, btab::TableauRKExplicit)
    fn_ = (t, y) -> fn(t, y[1])
    t,y = oderk_fixed(fn_, [y0], tspan, btab)
    return t,vcat(y...)
end
# For array-like y0 use this.  Note that the y0<:AbstractArray could be relaxed to 
# any indexing container with a Holy-trait
function oderk_fixed{N,S}(fn, y0::AbstractVector, tspan,
                          btab_::TableauRKExplicit{N,S})
    Ar, Et, Ey, Ef, btab = make_consistent_types(fn, y0, tspan, btab_)

    dof = length(y0)
    tsteps = length(tspan)
    ys = Ar(Ey, dof, tsteps)
    ys[:,1] = y0
    tspan = convert(Vector{Et}, tspan)
    # work arrays:
    ks = Ar(Ef, dof, S)
    ytmp = Ar(Ey, dof)

    for i=1:length(tspan)-1
        dt = tspan[i+1]-tspan[i]
        ys[:,i+1] = ys[:,i] # this also allocates a bit...
        for s=1:S
            for d=1:dof
                ytmp[d] = ys[d,i] # ytmp[:] = ys[:,i] allocates!
            end
            calc_next_k!(ks, ytmp, ytmp, s, fn, tspan[i], dt, dof, btab)
            for d=1:dof
                ys[d,i+1] += dt * btab.b[s]*ks[d,s]
            end
        end
    end
    return tspan, ys
end

I think this should work for arbitrary y0 which cannot be indexed (need to test this), and works fast for indexable y0. Note that Ar is a suitable container type which make_consistent_types figures out.

mauro3 commented 9 years ago

I updated the code slightly (but still a bit hacky at the moment) and also @acroy cutstom-type example: https://gist.github.com/mauro3/d605bff8e11994a372d9 Now this works with both the adaptive stepper and fixed step. So, my approach seems feasible.

(Note that currently my ode54_v3 solver is about 3x slower than the ode45, but I'm sure it can be made faster)

mauro3 commented 9 years ago

My use case is: I want to use ode time steppers in the near future (next two month) to time-step discretized PDEs with up to 10^5 degrees of freedom. (Currently I do this with Matlab and it works up to 2e4 DOFs). Whilst Julia may eventually make many of the bottlenecks go away, it will take some time to get there, certainly too long for me to wait. Also, for systems that size you want to work without allocations.

But I also see the need/use for "scalar" values, like in @acroy's gist. I think the approach I outlined above could serve both needs, albeit with a somewhat higher code complexity. Performance-wise it would probably favor non-scalar types but I think that is ok, as those are typically the large systems.

tomasaschan commented 9 years ago

OK, I see.

How about actually having two versions of the solver, with the choice between them being made on the types? Then one can explicitly dispatch on AbstractVector (or even Vector, if it makes life easier) and use all the indexing and other optimizations in your current version, while the other can be completely duck typed and only require that the input types support vector field operations (+, * <scalar> and norm), returning a Vector{typeof(y0)} instead of something more performant, like the previous version. If/when the performance bottlenecks of vectorized array code are fixed in base Julia so that they become equally performant, the vector specific method can be removed.

And by the way, I think a huge thank you is in place for putting time and effort into this re-write. Reading back in this thread it sounds like you're getting a lot of resistance that you don't deserve, just because a few of us happen to be picky - I hope you don't take it too hard :)

mauro3 commented 9 years ago

No worries and thanks!

I think the current version of the PR does what you describe, except that it avoids code duplication by wrapping any non-array thing y0 as [y0] and calling the array solver. Which then solves it as a single component vector system. The operations performed on a single component are always just the scalar math operations, so it just works. Does that make sense?

mauro3 commented 9 years ago

Ok, so I tidied everything up (but not quite ready). The last commit of the PR is now how I think the interface/internals should be. It works for array y0s and scalar y0 as in this gist: https://gist.github.com/mauro3/d605bff8e11994a372d9

I'll write about what interface the scalars and arrays need to conform to later. I realize this is the 10'000th commit but maybe someone still has some more time to look at this...

acroy commented 9 years ago

Your solvers return Vector{typeof(y)} and Matrix{eltype(y)} for "scalar" and array input, right? This breaks the current behavior, which isn't a big problem but we have to change the other solvers (e.g. ode23s) as well. I would suggest to either stick to the old behavior and change to Vector/Matrix in a separate PR for all solvers or to tag the current master as 0.2 and merge your PR afterwards. There are probably only a few user who will care, but we should try to keep the package consistent in some form.

mauro3 commented 9 years ago

Yes, I'm aware of this and still mulling over it. But for sure it should be merged in an orderly fashion. I could prepare another PR to first do the API change.

What I would like to have is the following input -> output

scalar y0 -> Vector{typeof(y0)}
Vector input (e.g. QuArray) -> Vector{QuArray}
2D vector input (e.g. [1., 2.]') -> Matrix{Float64}

So, API of 1 & 2 are the same and are consistent with the current API. 3 is different and a slightly breaking change. However the corresponding internals should be:

do everything as scalar. Implement by wrapping y0 as [y0] and calling (2)
do calculation element-wise in-place for each time step after initial creation of the current state vector
pre-allocate the output matrix. then do all calculations element-wise in-place

But I'm not sure yet how to implement it such that 2&3 can share the algorithmic part of the code.

Apart from the technical issues, would this be a good API & internals?

acroy commented 9 years ago

Sounds very reasonable to me.

tomasaschan commented 9 years ago

I see a few possible problematic points here - that doesn't mean I think this is a bad idea, but I want us to both conceptually and actually in code before we merge changes like these.

Implementing scalar solvers by wrapping in a vector isn't going to be type stable; for scalar y0 we want to return Vector{typeof(y0)} but 2) will return Vector{Vector{typeof(y0)}} if called with [y0] - I don't see a performant way of unwrapping that. (And wrapping scalars in a vector just to satisfy the api is going to be awful for performance anyway...)
The performance gains in using a matrix for the output comes from the ability to preallocate it, right? How do we know how many steps to allocate space for?
There's a few things that were possible in the old api that won't work the same if we start treating any non-AbstractVector as a pseudo-scalar - for example, it becomes more difficult to use custom norms to calculate the residual in the adaptive methods, a feature that was explicitly asked for on the mailing list.

I would much rather have the same api as before, with the optimizations in a specialization for the types for which they are applicable. I'd rather see some code duplication, since they won't actually be algorithmically equivalent anyway, than trying to fit everything into the optimized code with both shoehorn and vaseline...

acroy commented 9 years ago

Maybe we could put the performance critical in-place updates in separate functions and keep the "algorithmic part" general? What we need is basically something like add!(A,B,beta) which does A + beta*B -> A and copy!. For the former we can provide a specialized version for vectors (I think there was/is some discussion to have a function like that in Base, but I can't find it).

I tried this for oderk_fixed in this gist (which is a bit hacky) and the performance seems to be comparable to Mauro's implementation. Maybe there is some caveat though... And of course I don't know how easy it is to do something similar for the adaptive solver.

mauro3 commented 9 years ago

Thanks for the feedback Tomas:

Yes, it needs unwrapping but it doesn't take long. For example, a 10^7 long Vector{Vector{String}} takes 0.15s on my laptop. I would expect any ODE solver would take at least 10x longer, probably much longer, to produce so many values. Probably worse is the double anonymous function.
Fully preallocating would necessitate points=:specified, otherwise it will be push!.
The current API treats everything as a pseudo-scalar. I think both F and norm should act on the whole state, i.e. pseudo-scalar, irrespective of whether in scalar, vector or matrix mode. It is then up to the user to internally devectorize these functions if needed. So, I think the step-size control can stay the same (having said this, the step control in this PR needs more work to get to that).

Thanks Alex for prototyping this! The problem I see with your approach is that it only works with scalar y0 which are <:Number, otherwise the overloading of copy! doesn't work. Plus the overloading pollutes the Base functions with hacks. This could be rectified by using our own local versions of copy!. I just tried to update your version, pasted below your gist, and my experience was that it was quite confusing trying to do vectors and scalars at the same time...

So, my proposal to continue this PR, is to implement points 1 & 2 and leave 3 away for now. Thus the API is completely unchanged. I'm pretty sure that 3 can eventually be added onto 2 by using ArrayViews.jl. The advantage of this, compared to Alex's prototype is reduced complexity, all it takes is:

function oderk_fixed(fn, y0, tspan, btab::TableauRKExplicit)
    # Non-arrays y0 treat as scalar
    fn_ = (t, y) -> [fn(t, y[1])]
    t,y = oderk_fixed(fn_, [y0], tspan, btab)
    return t, vcat_nosplat(y)
end
vcat_nosplat(y) =  eltype(y[1])[el[1] for el in y]

, compared to 8 function definitions in Alex's prototype. If we find that scalar performance is indeed too bad, then let's revisit Alex's approach.

acroy commented 9 years ago

The problem I see with your approach is that it only works with scalar y0 which are <:Number, otherwise the overloading of copy! doesn't work. Plus the overloading pollutes the Base functions with hacks.

Yeah, the situation with copy! and similar is not really optimal (although one might argue that copy! for Numbers and other bitstypes should be included in Base anyways). I think the nice aspect of my approach is, that the algorithm is separated from performance optimizations for specific types.

But I also think that we don't have to do this separation now; your proposal looks very reasonable to me. The biggest problem for pseudo-scalars is probably the additional anonymous function.

mauro3 commented 9 years ago

Just now I updated according to my last post. (Edit: to state the obvious ;-)

mauro3 commented 9 years ago

This is it, point 1 & 2 done. Probably needs some moving of things to other files, removal of old solvers.

I want to add some more test, for instance Alex's example: https://gist.github.com/mauro3/d605bff8e11994a372d9 And I want to put together the required interface by the container and the scalar.

mauro3 commented 9 years ago

Life caught up, I'll work on this again in ~10 days.

berceanu commented 9 years ago

What is still missing before we can merge this? I might be able to do some coding this week.

mauro3 commented 9 years ago

Sorry for having left this so long and thanks @berceanu for offering help. I suspect that it is probably easiest if I finish this. However, code review would certainly be good. Also helpful would be if you could try out this branch on your problem and report back.

mauro3 commented 9 years ago

PR #70 adds a test for a scalar-like state.

mauro3 commented 9 years ago

I ran a performance test for the custom scalar type of #70 (with endt=2000*pi) https://gist.github.com/mauro3/8560062fbd7629e1d4cd

| Fixed step |  time |  MB |
|------------+-------+-----|
| ode4       | 0.003 | 4.7 |
| ode4_v2    | 0.005 | 6.4 |
|            |       |     |

| Adapt step | time |   MB |
|------------+------+------|
| ode45_dp   |  1.1 |  945 |
| ode54_v2   |  3.2 | 2024 |
|            |      |      |
| ode78_fb   | 0.85 |  839 |
| ode78_v2   | 1.65 | 1325 |

The _v2 are the new corresponding solvers. So the trick of wrapping a scalar-like thing in [] and run that way is indeed a bit inefficient. So, an approach as suggested by @acroy above might be better. (edited)

acroy commented 9 years ago

I am kind of surprised that the difference is that large ... On the other hand for array-like input this PR should be (much) faster, right? So the question is if we want everything now or if we are willing to accept that the less common case (?) is slower and address that in a separate PR?

acroy commented 9 years ago

@mauro3 : Could you provide some benchmarks for the number and array input cases?

I would suggest the following:

We tag the current master (or rather after merging #70) as v0.2. If for some reason anyone needs the current behavior it is at least easy to get it.
We merge this PR (when Travis is happy) and accept the performance regression for custom types. For most people this will probably bring an improvement though (given the benchmarks actually support this).
We work on the alternative approach to unify number/array/custom type input in a separate PR. This might need some discussion/synchronization with Julia core anyways.

Does that make sense?

SciML / ODE.jl

RFC: new adaptive (and fixed step) Runge Kutta solver #68