Closed abelsiqueira closed 9 years ago
I can reproduce a segfault with Julia 0.3 and 0.4 but not with Fortran: https://gist.github.com/74b1fbe47860dcc97084
Hello, sorry for the long time without message, but I think I finally made some progress. First, I'll clarify that I still don't know why this problem happens. I tried to recreate it with another C code, but I couldn't, and I also don't know how to debug this with ggdb.
What I do know is that the problem arises when creating some arrays inside the specialized
function. However, this does not happens every time.
io_err = Cint[0]
may cause an error, but Cint[goth]
doesn't appear to.
This gist shows an example
that represents the failure.
The io_err
problem can be fixed by using a global variable io_err
. This also removes a
bunch of lines of code, so I like it. But for the nvar
, this is a bit more complicated.
The "easiest" solution appears to be use ccall
and pass the reference, which also reduces
the complexity and dependence, as you mentioned in another thread some time ago.
However, there is still the problem of the returned variables, which must be created in
order to be returned.
In the uhprod
case, result
is created to store the Hessian-Vector product, then it is
used in the function, then returned to the user.
A solution, maybe inelegant, would be to create a workspace type inside CUTEstModel
to
store every used vector. We can't pass a copy, I tried that, so we can't use generic
workspaces.
In this gist I created an
example of what works. foo
creates result, so it eventually fails. bar
never fails.
Sorry for the long post, I really tried to verify some things before commenting to there were many things accumulated.
Thanks. I'll think about it. As a side note, I'm also getting the segfault with the Julia interface (which doesn't call the specialized interface) using
using CUTEst
nlp = CUTEstModel("ROSENBR")
for i = 1:100000
hprod(nlp, [0.0;0.0], [0.0;0.0])
end
The culprit is @eval
.
Don't know why yet, but, besides slowing down A LOT the execution, it causes
the random crashes. Maybe the pointers, maybe the memory, but removing it
in these two examples
made the code work. Both codes break if the call to ufn
is made inside @eval
.
Notice that the second example does not use our CUTEst.jl
, only ccall
s.
The problem is that since we use a variable libname, we can't remove @eval
.
A possible solution could be use a constant libname explicitly on the ccalls.
I also found out that indirect calls may work. I'm not sure what it is doing yet, but it's worth a shot. Here's the example
Just passing to say that the indirect calls are really coming along. I should have a pull request with stress tests in a couple of days.
That's awesome. Thanks! Is there a bug in @eval
then?
I've used indirect calls before. I don't think you need to redefine dlsym
though. It's already defined in Libdl
, isn't it? For example, I use it in AMD.jl.
Appears to be, I'll try to make a mwe for a Julia issue later.
I'll take a look at it after class.
Abel Siqueira sent from cell On Nov 5, 2015 7:00 PM, "Dominique" notifications@github.com wrote:
That's awesome. Thanks! Is there a bug in @eval then?
I've used indirect calls before. I don't think you need to redefine dlsym though. It's already defined in Libdl, isn't it? For example, I use it in AMD.jl https://github.com/dpo/AMD.jl/blob/master/src/amd_functions.jl.
— Reply to this email directly or view it on GitHub https://github.com/optimizers/CUTEst.jl/issues/45#issuecomment-154189527 .
About the dlsym
, it looks like I need the definition of @dlsym
because the first argument of the ccall is not constant. In addition, if it works like I think it does, the load is made only once, when the function is called for the first time, so if is very effective specially for these stress cases in which the function is called millions of times.
PR created. This updates the three interfaces and adds a stress test running the functions 100000 times each. Removing the @eval
makes this fast enough to be made even on Travis.
Thank you! That seems excellent. I wonder if @eval
is also responsible for the errors we see when we open two problems one after the other.
@dpo, I had a weird segmentation fault (random occurrence), could you try to recreate?