API clarification - Githubissues

berceanu commented 10 years ago

I would like to solve the 2 coupled time-dependent PDEs from http://scicomp.stackexchange.com/questions/11772/numerical-approach-to-coupled-nonlinear-pdes-with-time-dependence/11782?noredirect=1#comment18452_11782 using ode45. I already have a working Fortran 90 code for this, based on the ode45 presented in Numerical Recipes. The plan is to add this to the examples section once finished. I find the last test in https://github.com/JuliaLang/ODE.jl/blob/master/test/runtests.jl to be a good starting point, as it has a system of 2 coupled equations. However, my case is more complicated because:

the unknown functions v and w (called psiX and psiC in my problem) are themselves (complex) matrices, defined on a finite grid.
the RHS function (t, y) -> f is more involved, as it depends on a number of constants like kp, omegap, and the matrix F(x,y)
the laplacian term \delta psiC on the RHS of the second eq. has to be taken care of spectrally, i.e. fourier transforming, multipliying by k^2 and transforming back.

The main issue here: is it possible to use the current API to implement this problem? If not, I am willing to contribute in improving the API, but would appreciate some guidance.

berceanu commented 10 years ago

Forgot to mention I also posted recently on the mailing list at https://groups.google.com/forum/#!topic/julia-users/td8QBHHjwkk, but that was before I discovered the existence of ODE.jl.

acroy commented 10 years ago

Spontaneously I would say that it should be possible, although it might not give the best performance at the moment. But it would be a great example and performance test!

Regarding your points:

You could, for example, reshape the two matrices into vectors (using vec and or reshape) and vcat them together to get a "super solution vector" which is passed to the RHS function. Therein you would have to extract the matrices from the vectors. Or you can define a custom type, which contains psiX and psiC as fields. If you implement the necessary functions (you will need a norm, +, -, multiplication with/division by scalars), you can use the new type directly (see also #14).
Your RHS function can basically be any function, as long as it returns an object of the same type as your initial condition. Say, you havefunction myrhs(t, x, params) ... end, and then you can use it with ode45 like so ode45((t,x)->myrhs(t,x,params), x0, tspan).
In your RHS function you can of course use FFT to calculate the action of the Laplacian. I guess you know about Base.fft et al?

acroy commented 10 years ago

I have created an example for using a custom type with ode45, which can be found in this gist.

acroy commented 10 years ago

Looking at the question at SO I guess your third point concerns the split step technique? In that case I think it is not necessary to do the FFT in the RHS function. You would rather do the FFT between the solution steps. Let's say the RHS function of step 1 is called rhs1, one time step t->t+h in pseudocode would look like this

ts, ys = ode45(rhs1, y_t, [t,t+h])
psiC_ft = fourier_transform(ys[end].psiC)
psiC_ft = multiply_with_phase_factor(psiC_ft)  # each Fourier component is multiplied by exp(-im*k^2 h)
ys[end].psiC = inverse_fourier_transform(psiC_ft)
# ys[end] now contains approx. solution at t+h

The FFT and phase factor multiplication should probably be done in-place. Note that this will not give you an (overall) adaptive scheme - the step-size h has to be sufficiently small. Moreover, there are several possibilities to do the splitting (just search for "split step method nonlinear schrodinger equation"), which might be more accurate.

One aspect of the API, which would be nice in this context but is not implemented, is the keyword points (see api.md). points==:specified would avoid unnecessary memory allocation.

berceanu commented 10 years ago

Well actually, so far I am just implementing the adaptive Runge-Kutta which I already had working in Fortran to Julia. My SO question concerned other (faster) ways of doing the integration. The answer I got there about the operator splitting might not actually be valid for this problem, since on the rhs I have the time dependent term F(x,y) exp(-i \omega_p t) in the second equation.

acroy commented 10 years ago

In that case it should basically Just Work ...

berceanu commented 10 years ago

Ok, done. So now we have a fully working Julia implementation of my pattern formation problem and a Fortran 90 one as well.

Here is the IJulia notebook: http://nbviewer.ipython.org/github/berceanu/notebooks/blob/master/pattern_formation_with_julia.ipynb

and here the gist: https://gist.github.com/berceanu/f3faca3f6aa84cd5c7fa

Caveats:

due to the large size of my matrices, saving all the steps in xout was not an option. I therefore had to change the code of oderkf to include saving every Δsav steps (new oderkf included in the notebook). Not sure what the API plan for this feature are atm.
the initial step size of abs(tfinal - t)/100 was way too large for my problem and caused an immediate "Step size grew too small" error. I had to change it to abs(tfinal - t)/1e9. Perhaps it should be a parameter also?
it would be nice to be able to use this problem in the test suite of ODE.jl but I need some suggestions on how to write the actual test, since I have no analytic solution as in the other cases
last but not least, the julia version is still significantly slower than the fortran one (benchmarks coming soon). I would really appreciate any input on how to optimize it.

acroy commented 10 years ago

Very nice! Would be even cooler if it would work with stock ODE.jl ;-) But I think we will get there.

Regarding your caveats:

With dense output you would be able to address this to some extent (but then you need to know where to sample your output). I think MATLAB has an option Refine, which lets you specify if you want more or less intermediate points. Something like this might be useful?
We are planning to add maxstep, minstep and initstep as keywords. This shouldn't be too hard. Maybe you could submit a PR for this?
I think your code would be better suited for a collection of examples?
I only glanced over your code and I am not sure I will have the time to take a closer look soon. Anyways, maybe you should have a look at plan_fft (docs here) and (fft! docs here)? Also, in rhs you have

    dψc += im*potc .* ψ.c
    #adding the kinetic energy by means of the FFT
    dψc -= im*ifft(kin .* fft(ψ.c))

which will create temporary copies (+= and -= do). You might want to look at devectorize.jl.

berceanu commented 10 years ago

Slight change of plans. I decided to use DOP853 from https://github.com/JuliaLang/ODE.jl/pull/33 and implement the missing functionality there (i.e. dense output). So now I'm in the process of adapting @helgee's code to work with my example.

acroy commented 10 years ago

May I ask why you switched to DOP853? Is oderkf limiting the performance of your code?

berceanu commented 10 years ago

The decision is twofold. On one side, being a higher order RK method, it should take larger steps, resulting in fewer function evaluations (which in my case are really costly). Weather it will be faster or not in practice remains to be seen once I get a working version. Secondly, dense output is easier to imlpement in dop853 because the Fortran code by Hairer already has it.

acroy commented 10 years ago

I see. It will be interesting to see a comparison of the performance. (I guess you know that dense output for DOP853 involves additional evaluations of the rhs? It is (only?) DOPRI5 that doesn't require additional calls. If you "just" want to have a higher order RK method, you can try to find the corresponding Butcher table and use that with oderkf.)

How large is actually the difference between the F90 and the Julia code?

helgee commented 10 years ago

I have a simple benchmark for one of my use cases (the solution of the two-body Kepler problem):

1 revolution
- gfortran 4.8.2: 1.7e-5s
- gfortran 4.8.2 -Ofast: 6.1e-6s
- julia: 5.9e-5s
100 revolutions
- gfortran 4.8.2: 1.5e-3s
- gfortran 4.8.2 -Ofast: 5.3e-4s
- julia: 3e-3s

The Julia values are a few months old actually and I will re-test it with the current head today. I might also do a comparison with oderkf and the Intel Fortran Compiler.

tomasaschan commented 10 years ago

@helgee Nice! It's impressive that Julia is within an order of magnitude of gfortran even with -Ofast. Do you have the benchmarking code posted somewhere? Would be interesting to do a legibility comparison too ;)

helgee commented 10 years ago

@tlycken Yup, right here https://github.com/helgee/dop-julia

EDIT: Unfortunately I cannot reproduce the results and it is currently much slower.

berceanu commented 10 years ago

@acroy You're right. I just created a pull request with the Butcher table for the 7(8) order method. https://github.com/JuliaLang/ODE.jl/pull/41 Benchmarks against F90 and the 4(5) method coming soon.

tomasaschan commented 10 years ago

@helgee I modified your benchmark a little, simply because I couldn't figure out a reliable way to measure many evaluations in julia separately, so these timings are total times over a 10 000 iteration loop:

gfortran 4.8.2: ~2.0 s
gfortran 4.8.2 with -Ofast: ~0.87 s
julia: ~2.7-2.8 s

So there's still some way to go to beat gfortran -Ofast, but we're well within reach of gfortran. About 8% is still gc time, and by e.g. adding @inbounds begin ... end around the loops in dopcore i was able to shave off another 0.15 seconds.

berceanu commented 10 years ago

I was wondering, would the Sundials interface support my defined type based on complex matrices? I am referring to code in this gist: https://gist.github.com/berceanu/f3faca3f6aa84cd5c7fa

acroy commented 10 years ago

@berceanu : Unfortunately not. You will need to reshape the matrices into vectors and join them. Moreover I am not sure if you can actually use complex vectors with Sundials ...

@tlycken, @helgee : I have updated my "benchmark" in this gist to include ode78 and the gravity example. The result (included in the gist) is kind of interesting. I think it shows that we loose quite some time for memory allocation(?) at the moment (look at number of time steps and runtime) ... Might be interesting to compare to dop853.

tomasaschan commented 10 years ago

@acroy Interesting... We could probably get some more information from these benchmarks simply by re-ordering some things and piggy-backing on the more verbose output of @time, compared to the information from @elapsed - try changing

tau = @elapsed begin
    ...
end
println("*  elapsed time ($(nruns) runs): $(tau)")

to

print("* $(nruns) runs:")
@time begin
    ...
end

That should give us both memory allocation and gc info "for free", with minimal changes to the output format. If gc turns out to be a big factor, maybe we should consider calling it explicitly between runs, and somehow not count that time? After all, solving the same problem 10 times in a row isn't really a credible scenario...

acroy commented 10 years ago

@tlycken : I have updated the results using @time and also wrapping the loop in gc_disable() and gc_enable().

tomasaschan commented 10 years ago

@acroy: nice! I don't think this looks so suspicious, though. Sure, the higher-order method takes about half as many steps, but it will also do much more function evaluations per steps. I can't say that this is off without spending some time profiling - time which I unfortunately need to spend on finishing up my master's thesis, at the moment...

acroy commented 10 years ago

@tlycken: I think you are right. Most of the time difference comes from the different number of functions calls (I just counted the calls in gravity), so overall oderkf is working quite well/consistently.

tomasaschan commented 10 years ago

@acroy Nice =) This is starting to be terribly OT though...

@berceanu, have you been able to get what you wanted out of this issue? Can we close it, or do you need more help with something?

berceanu commented 10 years ago

Definately OTing, all ok on my side, tnx a lot for you help!

berceanu commented 10 years ago

@acroy just did a PR for changing the API to make it usable for my prob https://github.com/JuliaLang/ODE.jl/pull/43

SciML / ODE.jl

API clarification #37