franko / gsl-shell

GSL library shell based on LuaJIT2
http://franko.github.io/gsl-shell/
GNU General Public License v3.0
92 stars 12 forks source link

ode-vec branch integration #21

Open ladc opened 10 years ago

ladc commented 10 years ago

I took the liberty of rebasing your ode-vec branch on the master because I needed the RK8PD integrator recently (in vector form). Do you still intend to integrate this feature? If there are any known issues with it I can try to fix them and clean it up a little.

franko commented 10 years ago

Hi Lesley,

I will be glad to accept your contribution if you got time and interest to work on that. As far as I remember there were no real problems to finalize the integrators and the work was mostly finished or at least functional. I believe that the main remaining problem was that the integrator was exposing to the user a crude C array and I was planning to wrap the array in a column matrix.

If you are interested I can review the code to make you a point about the its current state. I'm pretty busy at work but I guess I should be able to spare one hour or two for that.

By the way, not related to that, are you knowledgeable about permittivity (dielectric constant) calculation in a crystal ? We are interested about quantum confinement effect of the permittivity in the optical spectrum and I would like to know what is computable and how for a crystal. Do you know where I can post these kind of questions to get some directions ?

ladc commented 10 years ago

Hi Francesco,

I haven't scrutinized the code, but indeed it seems to be functional. However, on one occasion I saw that it caused memory usage to ramp up steadily, followed by a LuaJIT out-of-memory panic. I'm having a hard time reproducing that, though. Otherwise, it was fairly compatible with my home-brewed array type (which has a data part and some questionable metamethods) -- yay for duck typing, I suppose :-) I'll have a look at how the interface could be made a bit friendlier (just-in-time conversion from/to a column matrix or so?). It might be a week or so before I'll have a closer look at that though. But I think it would be a shame not to merge this functionality in the main branch (this and other features!)

As for the permittivity calculations in a crystal, I'm afraid my knowledge of solid state physics is more or less limited to textbook basics. Do you mean quantum confinement as in quantum dots? I know of someone who numerically solves Maxwell's equations (http://mumax.github.io), but I suppose that's not quite the scale you're looking at :-)

ladc commented 10 years ago

Apparently the problem disappeared after I had pulled in the latest LuaJIT. The excessive memory usage was due to this bug: http://www.freelists.org/post/luajit/GC-break-after-Fix-GC-steps-threshold-handling-when-called-by-JITcompiled-code which was fixed in commit 9d90988. It was related to the earlier GC bug fix.

franko commented 10 years ago

Hi Lesley,

Apparently the problem disappeared after I had pulled in the latest LuaJIT. The excessive memory usage was due to this bug: http://www.freelists.org/post/luajit/GC-break-after-Fix-GC-steps-threshold-handling-when-called-by-JITcompiled-code which was fixed in commit 9d90988. It was related to the earlier GC bug fix.

good to know. Normally the luajit fix will be merged with the next gsl shell release but I can merge the luajit's changes in the master branch now if you prefer.

Francesco

ladc commented 10 years ago

That's fine by me, but there's no hurry!

ladc commented 10 years ago

Francesco, why exactly do cblas.lua and gsl.lua return the global namespace (ffi.C) under Linux? I just noticed that this is about 20x slower than if I explicitly load the library (again?) with return ffi.load("gslcblas"). Any idea why?

ladc commented 10 years ago

I wrote:

this is about 20x slower than if I explicitly load the library (again?) with return ffi.load("gslcblas"). Any idea why?

Apparently that was due to libgslcblas being astronomically faster than the libblas that was linked in...

franko commented 10 years ago

I'm not sure but I think the reason for returning ffi.C was that the gsl library were already linked to the executable because of the C API calls to the GSL library.

I didn't understood the point about which library is faster then which other. Normally gslcblas is very slow compared, for example, to openblas.

ladc commented 10 years ago

I'm not sure but I think the reason for returning ffi.C was that the gsl library were already linked to the executable because of the C API calls to the GSL library.

Yes, that was probably the reason. If I interpret the LJ docs, this would the case on all POSIX systems though (OSX as well) so perhaps it's safest to only explicitly load the library on Windows and just return ffi.C by default. Do you have a way to check if this works for OSX?

I didn't understood the point about which library is faster then which other. Normally gslcblas is very slow compared, for example, to openblas.

I just tested it against openblas, and that also appears to be much slower than gslcblas for this application. But in the general case it's better to use the system blas library (that's why the blas implementation was split off from GSL in the first place).

ladc commented 10 years ago

BTW, to make the ode-vec implementation compatible with matrices, I suppose you don't mind if I make everything 1-based? y[0] -> y[1] in the user-provided function, I mean.

franko commented 10 years ago

Of course you can :-)

I was thinking to a different approach, keeping a column matrix that is visible to the user so that it is 1-indexed and use the .data field in the implementation. Since the .data field is a C array it will be 0-based.

ladc commented 10 years ago

But then you get a pretty inconsistent situation, no? The y[0] in the user defined function becomes y[1] if you access the column matrix.

franko commented 10 years ago

My idea was that the user always sees the matrix and nothing else but may be I'm missing something. I guess an usage schema can help to fix the ideas.

ladc commented 10 years ago

That's another possibility. I feared that this might be slower, but I'd have to benchmark it first, of course. Otherwise it would just require some simple pointer arithmetic, just use f(t,s.y-1,s.dydt-1) in the rk*-vec implementation -- but that's probably asking for trouble. Having bound checks is a better idea :-)

BTW, why is the workspace matrix for rk8pd (13 + 6) x N-dimensional? I can't figure it out.

franko commented 10 years ago

I didn't remember this detail but looking at the code I think I understood. The space is (13 + 6) x N. I store in the array several N dimensional arrays: y, dydt, ksum8, ytmp1, ytmp2, ksum7. Finally I think that "k" needs for itself a space of 13 x N. For example you have the following code:

# for S = 2, 13 do
         cblas.cblas_dcopy($(N), y, 1, ytmp, 1)
         cblas.cblas_dgemm(RowMajor, Trans, NoTrans, $(N), 1, $(S-1), h, k, $(N), B$(S), 1, 1.0, ytmp, 1)
         f(t + $(ah[S-1]) * h, ytmp, k + $((S-1) * N))
# end

where "k" is addressed up to 13 x N in pointer arithmetic, ouch!

ladc commented 10 years ago

Thanks! Hence the difference in implementation between the two integration methods. I've pushed an example of the pointer arithmetic approach to my ode-vec-rebased branch, but feel free to reject it on the grounds of too ugly and unsafe ;-)

franko commented 10 years ago

Hi Lesley,

I'm looking of what you have done right now. I see that my beloved "damped-ho-evol.lua" does not work any more, I have to figure out why! :-)

The benchmark rk8pd-vec is also very slow, Do you think it is related to the gslcblas/openblas issue ?

I would like to have a more realistic benchmark with a big vector size since the vector form is supposed to be used only for system with many variables. I was thinking to the damped HO example but we can find also something else.

As for the pointer offset solution, this is quite smart but the problem is that you are still passing to the user a naked cdata array. This means that if the user access the vector out of the range anything unpredictable can happen including segfaults. For me this is not acceptable, the user should be protected from out of bounds accesses like what happens when you are using a matrix.

Sorry, but for the moment I don't have much time to go much beyond that first analysis. I'm not helping you very much!

ladc commented 10 years ago

Hi Francesco,

Of course, you're absolutely right, this should be exposed to the user as a matrix with proper bounds checking. In any case, that branch is far from ready... I hope to find some time to rework and finish (and test!) it soon.

As for the slow benchmark: you could be seeing the same discrepancy I saw with OpenBLAS vs gslcblas. Or alternatively it could be due to the memory issue in the snapshot of LuaJIT you're using in GSL Shell's master branch.