SciML / Sundials.jl

Julia interface to Sundials, including a nonlinear solver (KINSOL), ODE's (CVODE and ARKODE), and DAE's (IDA) in a SciML scientific machine learning enabled manner
https://diffeq.sciml.ai
BSD 2-Clause "Simplified" License
208 stars 78 forks source link

Kinsol is unstable for multiple variable problems #220

Open linwaytin opened 4 years ago

linwaytin commented 4 years ago

Hi there,

I'm solving the heat equation using finite difference method, in which an nonlinear solver is needed.

I found kinsol will be broken after several steps. The error is as follows.

Time step = 1
OK
Time step = 2
OK
Time step = 3
OK
Time step = 4
OK
Time step = 5
free(): invalid pointer

signal (6): Aborted
in expression starting at none:0
gsignal at /usr/bin/../lib/libc.so.6 (unknown line)
abort at /usr/bin/../lib/libc.so.6 (unknown line)
__libc_message at /usr/bin/../lib/libc.so.6 (unknown line)
malloc_printerr at /usr/bin/../lib/libc.so.6 (unknown line)
_int_free at /usr/bin/../lib/libc.so.6 (unknown line)
N_VDestroy_Serial at /home/linwaytin/.julia/packages/Sundials/CRi5j/deps/usr/lib/libsundials_nvecserial.so (unknown line)
KINFree at /home/linwaytin/.julia/packages/Sundials/CRi5j/deps/usr/lib/libsundials_kinsol.so (unknown line)
KINFree at /home/linwaytin/.julia/packages/Sundials/CRi5j/src/wrapped_api/kinsol.jl:215 [inlined]
release_handle at /home/linwaytin/.julia/packages/Sundials/CRi5j/src/handle.jl:72
unknown function (ip: 0x7fbbfe6f789d)
unknown function (ip: 0x7fbbfe6f848a)
unknown function (ip: 0x7fbbfe6f8672)
jl_gc_enable_finalizers at /usr/bin/../lib/libjulia.so.1 (unknown line)
jl_typeinf_end at /usr/bin/../lib/libjulia.so.1 (unknown line)
unknown function (ip: 0x7fbbf2851e66)
unknown function (ip: 0x7fbbf285239e)
unknown function (ip: 0x7fbbf28524c0)
unknown function (ip: 0x7fbbfe6a9b5c)
unknown function (ip: 0x7fbbfe6aa492)
jl_apply_generic at /usr/bin/../lib/libjulia.so.1 (unknown line)
display_error at ./client.jl:112
jl_f__apply at /usr/bin/../lib/libjulia.so.1 (unknown line)
jl_f__apply_latest at /usr/bin/../lib/libjulia.so.1 (unknown line)
unknown function (ip: 0x7fbbf28a8fa2)
unknown function (ip: 0x7fbbf28a99b3)
unknown function (ip: 0x55746d26e4fa)
unknown function (ip: 0x55746d26e0a7)
__libc_start_main at /usr/bin/../lib/libc.so.6 (unknown line)
unknown function (ip: 0x55746d26e15d)
Allocations: 22174347 (Pool: 22170137; Big: 4210); GC: 47
Aborted (core dumped)

If I make the grid finer (giving larger nonlinear systems), kinsol will be broken earlier. Does anybody have idea about this?

ChrisRackauckas commented 4 years ago

I'm solving the heat equation using finite difference method, in which an nonlinear solver is needed.

Why not use CVODE?

Does anybody have idea about this?

This looks like a memory leak. I'd look into how it's freeing. TBH the Kinsol code in here has gotten the least love because it's all been about the ODE/DAE solvers. (And I haven't found a case where Kinsol does better than NLsolve.jl)

linwaytin commented 4 years ago

Why not use CVODE?

Just curious about the performance. I tried also NLsolve.jl and Scipy's root through PyCall.jl They both solve the problem but NLsolve is much slower than Scipy's root, in particular for large systems. Sundials seems much faster but broken after a few steps.

ChrisRackauckas commented 4 years ago

I see. Yes, to investigate a performance issue is a reason to get this fixed up. I'll take a look at it later today after teaching.

in particular for large systems.

For large systems you'll likely want to exploit sparsity, which NLsolve doesn't do (SciPy doesn't either). Kinsol doesn't have all of it either... see

http://www.stochasticlifestyle.com/a-collection-of-jacobian-sparsity-acceleration-tools-for-julia/

https://github.com/JuliaNLSolvers/NLsolve.jl/issues/219

@pkofod were benchmarks ever setup for NLsolve? This is a little curious because a large nonlinear solve should essentially just cost a multiple of the number of matrix factorizations, so I wonder if there's an algorithmic improvement that needs to be done for it.

linwaytin commented 4 years ago

Thanks for taking care of this issue. Another interesting thing is that the performance of Scipy (root with Krylov method) and NLsolve are quite different. Probably there is something hidden in the implementation.

pkofod commented 4 years ago

Another interesting thing is that the performance of Scipy (root with Krylov method) and NLsolve are quite different. Probably there is something hidden in the implementation.

I don't see why that would be surprising. NLsolve does not do Krylov at all (though I'm re-writing both Optim an NLsolve's code, and the new version does have a Krylov solver that you can try if you want).

pkofod commented 4 years ago

I tried also NLsolve.jl and Scipy's root through PyCall.jl

Is that the Krylov solver then?

ChrisRackauckas commented 4 years ago

KINSOL isn't a Krylov method though, so if it is faster that's puzzling and worth looking into. I'd assume factorization costs dominate the allocations so much that it wouldn't matter at that point, so there's only a difference if you factorize more.

linwaytin commented 4 years ago

Is that the Krylov solver then?

Yes, I only use the Krylov solver.

KINSOL isn't a Krylov method though

According to its website, KINSOL contains a Newton-Krylov solver. Is this not what's called Krylov method?

ChrisRackauckas commented 4 years ago

I see, that part is wrapped but isn't used in the higher level interface. You'd have to call the wrapped API for it right now.

pkofod commented 4 years ago

Yes, I only use the Krylov solver.

Okay, but then NLsolve is obviously crippled here, because it does not do a krylov solver. How hard would it be to produce a function that calculates the residual vector?