Segmentation fault - Githubissues

abelsiqueira commented 9 years ago

@dpo, I had a weird segmentation fault (random occurrence), could you try to recreate?

using CUTEst
nlp = CUTEstModel("ROSENBR")
for i = 1:100000
    uhprod(nlp, false, [0.0;0.0], [0.0;0.0])
end

dpo commented 9 years ago

I can reproduce a segfault with Julia 0.3 and 0.4 but not with Fortran: https://gist.github.com/74b1fbe47860dcc97084

abelsiqueira commented 9 years ago

Hello, sorry for the long time without message, but I think I finally made some progress. First, I'll clarify that I still don't know why this problem happens. I tried to recreate it with another C code, but I couldn't, and I also don't know how to debug this with ggdb.

What I do know is that the problem arises when creating some arrays inside the specialized function. However, this does not happens every time. io_err = Cint[0] may cause an error, but Cint[goth] doesn't appear to. This gist shows an example that represents the failure.

The io_err problem can be fixed by using a global variable io_err. This also removes a bunch of lines of code, so I like it. But for the nvar, this is a bit more complicated. The "easiest" solution appears to be use ccall and pass the reference, which also reduces the complexity and dependence, as you mentioned in another thread some time ago. However, there is still the problem of the returned variables, which must be created in order to be returned. In the uhprod case, result is created to store the Hessian-Vector product, then it is used in the function, then returned to the user. A solution, maybe inelegant, would be to create a workspace type inside CUTEstModel to store every used vector. We can't pass a copy, I tried that, so we can't use generic workspaces. In this gist I created an example of what works. foo creates result, so it eventually fails. bar never fails.

Sorry for the long post, I really tried to verify some things before commenting to there were many things accumulated.

dpo commented 9 years ago

Thanks. I'll think about it. As a side note, I'm also getting the segfault with the Julia interface (which doesn't call the specialized interface) using

using CUTEst
nlp = CUTEstModel("ROSENBR")
for i = 1:100000
    hprod(nlp, [0.0;0.0], [0.0;0.0])
end

abelsiqueira commented 9 years ago

The culprit is @eval.

Don't know why yet, but, besides slowing down A LOT the execution, it causes the random crashes. Maybe the pointers, maybe the memory, but removing it in these two examples made the code work. Both codes break if the call to ufn is made inside @eval. Notice that the second example does not use our CUTEst.jl, only ccalls.

The problem is that since we use a variable libname, we can't remove @eval. A possible solution could be use a constant libname explicitly on the ccalls.

abelsiqueira commented 9 years ago

I also found out that indirect calls may work. I'm not sure what it is doing yet, but it's worth a shot. Here's the example

abelsiqueira commented 9 years ago

Just passing to say that the indirect calls are really coming along. I should have a pull request with stress tests in a couple of days.

dpo commented 9 years ago

That's awesome. Thanks! Is there a bug in @eval then?

I've used indirect calls before. I don't think you need to redefine dlsym though. It's already defined in Libdl, isn't it? For example, I use it in AMD.jl.

abelsiqueira commented 9 years ago

Appears to be, I'll try to make a mwe for a Julia issue later.

I'll take a look at it after class.

Abel Siqueira sent from cell On Nov 5, 2015 7:00 PM, "Dominique" notifications@github.com wrote:

That's awesome. Thanks! Is there a bug in @eval then?

I've used indirect calls before. I don't think you need to redefine dlsym though. It's already defined in Libdl, isn't it? For example, I use it in AMD.jl https://github.com/dpo/AMD.jl/blob/master/src/amd_functions.jl.

— Reply to this email directly or view it on GitHub https://github.com/optimizers/CUTEst.jl/issues/45#issuecomment-154189527 .

abelsiqueira commented 9 years ago

About the dlsym, it looks like I need the definition of @dlsym because the first argument of the ccall is not constant. In addition, if it works like I think it does, the load is made only once, when the function is called for the first time, so if is very effective specially for these stress cases in which the function is called millions of times.

abelsiqueira commented 9 years ago

PR created. This updates the three interfaces and adds a stress test running the functions 100000 times each. Removing the @eval makes this fast enough to be made even on Travis.

dpo commented 9 years ago

Thank you! That seems excellent. I wonder if @eval is also responsible for the errors we see when we open two problems one after the other.

JuliaSmoothOptimizers / CUTEst.jl

Segmentation fault #45