cvxgrp / scs

Splitting Conic Solver
MIT License
553 stars 136 forks source link

Segmentation fault with SCS.jl and JuMP.jl #254

Open WellWellww opened 1 year ago

WellWellww commented 1 year ago

Hi, I was using JuMP.jl and SCS.jl to code and solve an optimization problem. The problem involved 234,932 optimization variables and 251,088 constraints. However, I encountered the following error:

[113945] signal (11.1): Segmentation fault
ldl_prepare at /workspace/srcdir/scs/linsys/cpu/direct/private.c:34 [inlined]
scs_init_lin_sys_work at /workspace/srcdir/scs/linsys/cpu/direct/private.c:237
init_work at /workspace/srcdir/scs/src/scs.c:890 [inlined]
scs_init at /workspace/srcdir/scs/src/scs.c:1227
scs_init at /public1/home/user/.julia/packages/SCS/owpZW/src/linear_solvers/direct.jl:25 [inlined]
_unsafe_scs_solve at /public1/home/user/.julia/packages/SCS/owpZW/src/c_wrapper.jl:390
#scs_solve#13 at /public1/home/user/.julia/packages/SCS/owpZW/src/c_wrapper.jl:349
scs_solve at /public1/home/user/.julia/packages/SCS/owpZW/src/c_wrapper.jl:278
unknown function (ip: 0x2ab7e4d97b4f)
_jl_invoke at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2940
optimize! at /public1/home/user/.julia/packages/SCS/owpZW/src/MOI_wrapper/MOI_wrapper.jl:366
unknown function (ip: 0x2ab7e4d9355b)
_jl_invoke at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2940
optimize! at /public1/home/user/.julia/packages/SCS/owpZW/src/MOI_wrapper/MOI_wrapper.jl:440
optimize! at /public1/home/user/.julia/packages/MathOptInterface/BlCD1/src/Utilities/cachingoptimizer.jl:316
unknown function (ip: 0x2ab7e4d7a062)
unknown function (ip: 0x2ab7e4d62309)
unknown function (ip: 0x2ab7e4d622aa)
optimize! at /public1/home/user/.julia/packages/MathOptInterface/BlCD1/src/Bridges/bridge_optimizer.jl:376 [inlined]
optimize! at /public1/home/user/.julia/packages/MathOptInterface/BlCD1/src/MathOptInterface.jl:85 [inlined]
optimize! at /public1/home/user/.julia/packages/MathOptInterface/BlCD1/src/Utilities/cachingoptimizer.jl:316
unknown function (ip: 0x2ab7e4d62272)
_jl_invoke at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2940
#optimize!#113 at /public1/home/user/.julia/packages/JuMP/ptoff/src/optimizer_interface.jl:440
optimize! at /public1/home/user/.julia/packages/JuMP/ptoff/src/optimizer_interface.jl:410
jfptr_optimizeNOT._2915 at /public1/home/user/.julia/compiled/v1.9/JuMP/DmXqY_u22Yc.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
do_call at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/interpreter.c:126
eval_value at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/interpreter.c:226
eval_stmt_value at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/interpreter.c:177 [inlined]
eval_body at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/interpreter.c:624
jl_interpret_toplevel_thunk at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/interpreter.c:762
jl_toplevel_eval_flex at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/toplevel.c:912
jl_toplevel_eval_flex at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/toplevel.c:856
ijl_toplevel_eval_in at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/toplevel.c:971
eval at ./boot.jl:370 [inlined]
include_string at ./loading.jl:1864
_jl_invoke at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2940
_include at ./loading.jl:1924
include at ./client.jl:478
unknown function (ip: 0x2ab7e4cd8272)
_jl_invoke at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
do_call at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/interpreter.c:126
eval_value at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/interpreter.c:226
eval_stmt_value at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/interpreter.c:177 [inlined]
eval_body at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/interpreter.c:624
jl_interpret_toplevel_thunk at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/interpreter.c:762
jl_toplevel_eval_flex at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/toplevel.c:912
jl_toplevel_eval_flex at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/toplevel.c:856
ijl_toplevel_eval_in at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/toplevel.c:971
eval at ./boot.jl:370 [inlined]
include_string at ./loading.jl:1864
_jl_invoke at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2940
_include at ./loading.jl:1924
include at ./Base.jl:457
jfptr_include_43521.clone_1 at /public1/home/user/julia/julia-1.9.0/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2940
exec_options at ./client.jl:307
_start at ./client.jl:522
jfptr__start_37386.clone_1 at /public1/home/user/julia/julia-1.9.0/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
true_main at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/jlapi.c:573
jl_repl_entrypoint at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/jlapi.c:717
main at julia (unknown line)
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x401098)
Allocations: 122536122659 (Pool: 122441856243; Big: 94266416); GC: 93

I want to note that I had enough memory available and my system has a total memory of 2T with 64 cores, and the code only costs around 1T. Interestingly, when I worked on smaller-scale optimization problems (say, 79,054 variables and 84,145 constraints) constructed using the same principle, the program returned the correct results. Hence, I am curious if anyone has any insights into why this error is occurring. For instance, does this error occur because the scale of the problem is too large?

Other information:

julia> versioninfo()
Julia Version 1.9.0
Commit 8e630552924 (2023-05-07 11:25 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × Intel(R) Xeon(R) Silver 4208 CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, cascadelake)
  Threads: 1 on 32 virtual cores
Environment:
  LD_LIBRARY_PATH = /public1/soft/intel/2015/impi/5.0.3.049/intel64/lib:/public1/soft/intel/2015/composer_xe_2015.6.233/debugger/libipt/intel64/lib:/public1/soft/intel/2015/composer_xe_2015.6.233/tbb/lib/intel64/gcc4.4:/public1/soft/intel/2015/composer_xe_2015.6.233/mkl/lib/intel64:/public1/soft/intel/2015/composer_xe_2015.6.233/ipp/tools/intel64/perfsys:/public1/soft/intel/2015/composer_xe_2015.6.233/ipp/lib/intel64:/public1/soft/intel/2015/composer_xe_2015.6.233/ipp/../compiler/lib/intel64:/public1/soft/intel/2015/composer_xe_2015.6.233/mpirt/lib/intel64:/public1/soft/intel/2015/composer_xe_2015.6.233/compiler/lib/intel64
  LD_LIBRARY_PATH_modshare = /public1/soft/intel/2015/composer_xe_2015.6.233/tbb/lib/intel64/gcc4.4:1:/public1/soft/intel/2015/composer_xe_2015.6.233/ipp/tools/intel64/perfsys:1:/public1/soft/intel/2015/composer_xe_2015.6.233/ipp/../compiler/lib/intel64:1:/public1/soft/intel/2015/composer_xe_2015.6.233/compiler/lib/intel64:1:/public1/soft/intel/2015/composer_xe_2015.6.233/debugger/libipt/intel64/lib:1:/public1/soft/intel/2015/composer_xe_2015.6.233/mkl/lib/intel64:1:/public1/soft/intel/2015/composer_xe_2015.6.233/mpirt/lib/intel64:1:/public1/soft/intel/2015/impi/5.0.3.049/intel64/lib:1:/public1/soft/intel/2015/composer_xe_2015.6.233/ipp/lib/intel64:1
kalmarek commented 1 year ago

You may try import MKL_jll before using SCS and then try passing linear_solver=SCS.MKLDirectSolver;

On my cases (problem: variables n: 570684, constraints m: 1030531) as reported here SCS.DirectSolver run out of memory with a similar segfault. MKL solver used much less memory and solved the problem much faster. Give it a try!

bodono commented 1 year ago

Depending on the sparsity the problem might be too large, it's a little unusual to OOM but it can happen. I agree with @kalmarek, you should try the MKL version of SCS which is typically faster and more memory efficient. Otherwise, you could post on the SCS.jl repo for help.