Closed berceanu closed 10 years ago
Forgot to mention I also posted recently on the mailing list at https://groups.google.com/forum/#!topic/julia-users/td8QBHHjwkk, but that was before I discovered the existence of ODE.jl.
Spontaneously I would say that it should be possible, although it might not give the best performance at the moment. But it would be a great example and performance test!
Regarding your points:
vec
and or reshape
) and vcat
them together to get a "super solution vector" which is passed to the RHS function. Therein you would have to extract the matrices from the vectors. Or you can define a custom type, which contains psiX and psiC as fields. If you implement the necessary functions (you will need a norm
, +
, -
, multiplication with/division by scalars), you can use the new type directly (see also #14).function myrhs(t, x, params) ... end
, and then you can use it with ode45
like so ode45((t,x)->myrhs(t,x,params), x0, tspan)
.I have created an example for using a custom type with ode45, which can be found in this gist.
Looking at the question at SO I guess your third point concerns the split step technique? In that case I think it is not necessary to do the FFT in the RHS function. You would rather do the FFT between the solution steps. Let's say the RHS function of step 1 is called rhs1
, one time step t->t+h
in pseudocode would look like this
ts, ys = ode45(rhs1, y_t, [t,t+h])
psiC_ft = fourier_transform(ys[end].psiC)
psiC_ft = multiply_with_phase_factor(psiC_ft) # each Fourier component is multiplied by exp(-im*k^2 h)
ys[end].psiC = inverse_fourier_transform(psiC_ft)
# ys[end] now contains approx. solution at t+h
The FFT and phase factor multiplication should probably be done in-place. Note that this will not give you an (overall) adaptive scheme - the step-size h
has to be sufficiently small. Moreover, there are several possibilities to do the splitting (just search for "split step method nonlinear schrodinger equation"), which might be more accurate.
One aspect of the API, which would be nice in this context but is not implemented, is the keyword points
(see api.md). points==:specified
would avoid unnecessary memory allocation.
Well actually, so far I am just implementing the adaptive Runge-Kutta which I already had working in Fortran to Julia. My SO question concerned other (faster) ways of doing the integration. The answer I got there about the operator splitting might not actually be valid for this problem, since on the rhs I have the time dependent term F(x,y) exp(-i \omega_p t) in the second equation.
In that case it should basically Just Work ...
Ok, done. So now we have a fully working Julia implementation of my pattern formation problem and a Fortran 90 one as well.
Here is the IJulia notebook: http://nbviewer.ipython.org/github/berceanu/notebooks/blob/master/pattern_formation_with_julia.ipynb
and here the gist: https://gist.github.com/berceanu/f3faca3f6aa84cd5c7fa
Caveats:
xout
was not an option. I therefore had to change the code of oderkf
to include saving every Δsav
steps (new oderkf
included in the notebook). Not sure what the API plan for this feature are atm.abs(tfinal - t)/100
was way too large for my problem and caused an immediate "Step size grew too small" error. I had to change it to abs(tfinal - t)/1e9
. Perhaps it should be a parameter also?Very nice! Would be even cooler if it would work with stock ODE.jl ;-) But I think we will get there.
Regarding your caveats:
Refine
, which lets you specify if you want more or less intermediate points. Something like this might be useful?maxstep
, minstep
and initstep
as keywords. This shouldn't be too hard. Maybe you could submit a PR for this?plan_fft
(docs here) and (fft!
docs here)? Also, in rhs
you have dψc += im*potc .* ψ.c
#adding the kinetic energy by means of the FFT
dψc -= im*ifft(kin .* fft(ψ.c))
which will create temporary copies (+=
and -=
do). You might want to look at devectorize.jl.
Slight change of plans. I decided to use DOP853 from https://github.com/JuliaLang/ODE.jl/pull/33 and implement the missing functionality there (i.e. dense output). So now I'm in the process of adapting @helgee's code to work with my example.
May I ask why you switched to DOP853
? Is oderkf
limiting the performance of your code?
The decision is twofold. On one side, being a higher order RK method, it should take larger steps, resulting in fewer function evaluations (which in my case are really costly). Weather it will be faster or not in practice remains to be seen once I get a working version. Secondly, dense output is easier to imlpement in dop853
because the Fortran code by Hairer already has it.
I see. It will be interesting to see a comparison of the performance. (I guess you know that dense output for DOP853
involves additional evaluations of the rhs? It is (only?) DOPRI5
that doesn't require additional calls. If you "just" want to have a higher order RK method, you can try to find the corresponding Butcher table and use that with oderkf
.)
How large is actually the difference between the F90 and the Julia code?
I have a simple benchmark for one of my use cases (the solution of the two-body Kepler problem):
The Julia values are a few months old actually and I will re-test it with the current head today. I might also do a comparison with oderkf
and the Intel Fortran Compiler.
@helgee Nice! It's impressive that Julia is within an order of magnitude of gfortran even with -Ofast. Do you have the benchmarking code posted somewhere? Would be interesting to do a legibility comparison too ;)
@tlycken Yup, right here https://github.com/helgee/dop-julia
EDIT: Unfortunately I cannot reproduce the results and it is currently much slower.
@acroy You're right. I just created a pull request with the Butcher table for the 7(8) order method. https://github.com/JuliaLang/ODE.jl/pull/41 Benchmarks against F90 and the 4(5) method coming soon.
@helgee I modified your benchmark a little, simply because I couldn't figure out a reliable way to measure many evaluations in julia separately, so these timings are total times over a 10 000 iteration loop:
gfortran 4.8.2: ~2.0 s
gfortran 4.8.2 with -Ofast: ~0.87 s
julia: ~2.7-2.8 s
So there's still some way to go to beat gfortran -Ofast
, but we're well within reach of gfortran
. About 8% is still gc time, and by e.g. adding @inbounds begin ... end
around the loops in dopcore
i was able to shave off another 0.15 seconds.
I was wondering, would the Sundials interface support my defined type based on complex matrices? I am referring to code in this gist: https://gist.github.com/berceanu/f3faca3f6aa84cd5c7fa
@berceanu : Unfortunately not. You will need to reshape the matrices into vectors and join them. Moreover I am not sure if you can actually use complex vectors with Sundials ...
@tlycken, @helgee : I have updated my "benchmark" in this gist to include ode78
and the gravity example. The result (included in the gist) is kind of interesting. I think it shows that we loose quite some time for memory allocation(?) at the moment (look at number of time steps and runtime) ... Might be interesting to compare to dop853
.
@acroy Interesting... We could probably get some more information from these benchmarks simply by re-ordering some things and piggy-backing on the more verbose output of @time
, compared to the information from @elapsed
- try changing
tau = @elapsed begin
...
end
println("* elapsed time ($(nruns) runs): $(tau)")
to
print("* $(nruns) runs:")
@time begin
...
end
That should give us both memory allocation and gc info "for free", with minimal changes to the output format. If gc turns out to be a big factor, maybe we should consider calling it explicitly between runs, and somehow not count that time? After all, solving the same problem 10 times in a row isn't really a credible scenario...
@tlycken : I have updated the results using @time
and also wrapping the loop in gc_disable()
and gc_enable()
.
@acroy: nice! I don't think this looks so suspicious, though. Sure, the higher-order method takes about half as many steps, but it will also do much more function evaluations per steps. I can't say that this is off without spending some time profiling - time which I unfortunately need to spend on finishing up my master's thesis, at the moment...
@tlycken: I think you are right. Most of the time difference comes from the different number of functions calls (I just counted the calls in gravity
), so overall oderkf
is working quite well/consistently.
@acroy Nice =) This is starting to be terribly OT though...
@berceanu, have you been able to get what you wanted out of this issue? Can we close it, or do you need more help with something?
Definately OTing, all ok on my side, tnx a lot for you help!
@acroy just did a PR for changing the API to make it usable for my prob https://github.com/JuliaLang/ODE.jl/pull/43
I would like to solve the 2 coupled time-dependent PDEs from http://scicomp.stackexchange.com/questions/11772/numerical-approach-to-coupled-nonlinear-pdes-with-time-dependence/11782?noredirect=1#comment18452_11782 using ode45. I already have a working Fortran 90 code for this, based on the ode45 presented in Numerical Recipes. The plan is to add this to the examples section once finished. I find the last test in https://github.com/JuliaLang/ODE.jl/blob/master/test/runtests.jl to be a good starting point, as it has a system of 2 coupled equations. However, my case is more complicated because:
v
andw
(called psiX and psiC in my problem) are themselves (complex) matrices, defined on a finite grid.The main issue here: is it possible to use the current API to implement this problem? If not, I am willing to contribute in improving the API, but would appreciate some guidance.