JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.58k stars 5.48k forks source link

GC error/segfault 11 in 1.9/1.9.1 during large optimization problems #50145

Open keithjlee opened 1 year ago

keithjlee commented 1 year ago

I'm running into strange segmentation errors while testing an optimization package under development, with different error messages depending on whether I am at my laptop (M1 Macbook Pro) or my desktop (Win 11).

A MWE is slightly difficult as there are a lot of custom structs passing information to each other, but in general the following workflow is used:

  1. define a structural building model Model
  2. define a set of AbstractVariables for an optimization problem
  3. generate an AbstractOptParams from the model and variables
  4. use Optimization.jl + OptimizationNLopt for optimization, with Zygote as the AD backend

When I run test scripts, this process works fine and the results are as expected, but rerunning the script (with or without small changes) multiple times eventually leads to a fatal crash with the following error (in Windows):

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x7ffa192e76a2 -- jl_object_id__cold at C:/workdir/src\builtins.c:417
in expression starting at none:0
jl_object_id__cold at C:/workdir/src\builtins.c:417
type_hash at C:/workdir/src\jltypes.c:1332
typekey_hash at C:/workdir/src\jltypes.c:1344
jl_precompute_memoized_dt at C:/workdir/src\jltypes.c:1409
inst_datatype_inner at C:/workdir/src\jltypes.c:1731
jl_inst_arg_tuple_type at C:/workdir/src\jltypes.c:1826
arg_type_tuple at C:/workdir/src\gf.c:2100 [inlined]
jl_lookup_generic_ at C:/workdir/src\gf.c:2884
ijl_apply_generic at C:/workdir/src\gf.c:2936
getvariables at c:\Users\keithjl\.vscode\extensions\julialang.language-julia-1.47.2\scripts\packages\VSCodeServer\src\trees.jl:295
unknown function (ip: 000001873052d2a8)
jl_apply at C:/workdir/src\julia.h:1879 [inlined]
jl_f__call_latest at C:/workdir/src\builtins.c:774
#invokelatest#2 at .\essentials.jl:816 [inlined]
invokelatest at .\essentials.jl:813 [inlined]
repl_getvariables_request at c:\Users\keithjl\.vscode\extensions\julialang.language-julia-1.47.2\scripts\packages\VSCodeServer\src\trees.jl:269
unknown function (ip: 00000187304a2d7a)
dispatch_msg at c:\Users\keithjl\.vscode\extensions\julialang.language-julia-1.47.2\scripts\packages\JSONRPC\src\typed.jl:67
dispatch_msg at c:\Users\keithjl\.vscode\extensions\julialang.language-julia-1.47.2\scripts\packages\VSCodeServer\src\VSCodeServer.jl:100
unknown function (ip: 00000187304a04b9)
macro expansion at c:\Users\keithjl\.vscode\extensions\julialang.language-julia-1.47.2\scripts\packages\VSCodeServer\src\VSCodeServer.jl:148 [inlined]
macro expansion at .\task.jl:476 [inlined]
macro expansion at c:\Users\keithjl\.vscode\extensions\julialang.language-julia-1.47.2\scripts\packages\VSCodeServer\src\VSCodeServer.jl:142 [inlined]
#214 at .\task.jl:514
unknown function (ip: 000001873049c813)
jl_apply at C:/workdir/src\julia.h:1879 [inlined]
start_task at C:/workdir/src\task.c:1092
Allocations: 187096933 (Pool: 187001292; Big: 95641); GC: 195
GC error (probable corruption) :
Allocations: 187117154 (Pool: 187021451; Big: 95703); GC: 195

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x7ffa19352ce9 -- jl_static_is_function_ at C:/workdir/src\rtutils.c:688 [inlined]
jl_static_show_x_ at C:/workdir/src\rtutils.c:1102
in expression starting at none:0
jl_static_is_function_ at C:/workdir/src\rtutils.c:688 [inlined]
jl_static_show_x_ at C:/workdir/src\rtutils.c:1102
jl_static_show_next_ at C:/workdir/src\rtutils.c:1251
jl_static_show_x at C:/workdir/src\rtutils.c:1196 [inlined]
ijl_static_show at C:/workdir/src\rtutils.c:1256 [inlined]
jl_ at C:/workdir/src\rtutils.c:1340
gc_assert_datatype_fail at C:/workdir/src\gc.c:1909
gc_mark_loop at C:/workdir/src\gc.c:3020
_jl_gc_collect at C:/workdir/src\gc.c:3400
ijl_gc_collect at C:/workdir/src\gc.c:3709
gc at .\gcutils.jl:98 [inlined]
#temp_cleanup_purge#19 at .\file.jl:540
temp_cleanup_purge at .\file.jl:535 [inlined]
#930 at .\initdefs.jl:354
jfptr_YY.930_32570.clone_1 at C:\Users\keithjl\.julia\juliaup\julia-1.9.1+0.x64.w64.mingw32\lib\julia\sys.dll (unknown line)
_atexit at .\initdefs.jl:387
jfptr__atexit_42479.clone_1 at C:\Users\keithjl\.julia\juliaup\julia-1.9.1+0.x64.w64.mingw32\lib\julia\sys.dll (unknown line)
jl_apply at C:/workdir/src\julia.h:1879 [inlined]
ijl_atexit_hook at C:/workdir/src\init.c:280
ijl_exit at C:/workdir/src\init.c:207
jl_exception_handler at C:/workdir/src\signals-win.c:337 [inlined]
jl_exception_handler at C:/workdir/src\signals-win.c:229
__julia_personality at C:/workdir/src\win32_ucontext.c:28
_chkstk at C:\windows\SYSTEM32\ntdll.dll (unknown line)
RtlFindCharInUnicodeString at C:\windows\SYSTEM32\ntdll.dll (unknown line)
KiUserExceptionDispatcher at C:\windows\SYSTEM32\ntdll.dll (unknown line)
jl_object_id__cold at C:/workdir/src\builtins.c:417
type_hash at C:/workdir/src\jltypes.c:1332
typekey_hash at C:/workdir/src\jltypes.c:1344
jl_precompute_memoized_dt at C:/workdir/src\jltypes.c:1409
inst_datatype_inner at C:/workdir/src\jltypes.c:1731
jl_inst_arg_tuple_type at C:/workdir/src\jltypes.c:1826
arg_type_tuple at C:/workdir/src\gf.c:2100 [inlined]
jl_lookup_generic_ at C:/workdir/src\gf.c:2884
ijl_apply_generic at C:/workdir/src\gf.c:2936
getvariables at c:\Users\keithjl\.vscode\extensions\julialang.language-julia-1.47.2\scripts\packages\VSCodeServer\src\trees.jl:295
unknown function (ip: 000001873052d2a8)
jl_apply at C:/workdir/src\julia.h:1879 [inlined]
jl_f__call_latest at C:/workdir/src\builtins.c:774
#invokelatest#2 at .\essentials.jl:816 [inlined]
invokelatest at .\essentials.jl:813 [inlined]
repl_getvariables_request at c:\Users\keithjl\.vscode\extensions\julialang.language-julia-1.47.2\scripts\packages\VSCodeServer\src\trees.jl:269
unknown function (ip: 00000187304a2d7a)
dispatch_msg at c:\Users\keithjl\.vscode\extensions\julialang.language-julia-1.47.2\scripts\packages\JSONRPC\src\typed.jl:67
dispatch_msg at c:\Users\keithjl\.vscode\extensions\julialang.language-julia-1.47.2\scripts\packages\VSCodeServer\src\VSCodeServer.jl:100
unknown function (ip: 00000187304a04b9)
macro expansion at c:\Users\keithjl\.vscode\extensions\julialang.language-julia-1.47.2\scripts\packages\VSCodeServer\src\VSCodeServer.jl:148 [inlined]
macro expansion at .\task.jl:476 [inlined]
macro expansion at c:\Users\keithjl\.vscode\extensions\julialang.language-julia-1.47.2\scripts\packages\VSCodeServer\src\VSCodeServer.jl:142 [inlined]
#214 at .\task.jl:514
unknown function (ip: 000001873049c813)
jl_apply at C:/workdir/src\julia.h:1879 [inlined]
start_task at C:/workdir/src\task.c:1092
Allocations: 187117154 (Pool: 187021451; Big: 95703); GC: 195

And on my MacBook:

[76488] signal (11.2): Segmentation fault: 11
in expression starting at none:0
jl_object_id__cold at /Users/kjl/.julia/juliaup/julia-1.9.1+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
type_hash at /Users/kjl/.julia/juliaup/julia-1.9.1+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
typekey_hash at /Users/kjl/.julia/juliaup/julia-1.9.1+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
jl_precompute_memoized_dt at /Users/kjl/.julia/juliaup/julia-1.9.1+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
inst_datatype_inner at /Users/kjl/.julia/juliaup/julia-1.9.1+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
jl_inst_arg_tuple_type at /Users/kjl/.julia/juliaup/julia-1.9.1+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
ijl_apply_generic at /Users/kjl/.julia/juliaup/julia-1.9.1+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
getvariables at /Users/kjl/.vscode/extensions/julialang.language-julia-1.47.2/scripts/packages/VSCodeServer/src/trees.jl:295
unknown function (ip: 0x2fb99452b)
ijl_apply_generic at /Users/kjl/.julia/juliaup/julia-1.9.1+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
jl_f__call_latest at /Users/kjl/.julia/juliaup/julia-1.9.1+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
#invokelatest#2 at ./essentials.jl:816 [inlined]
invokelatest at ./essentials.jl:813 [inlined]
repl_getvariables_request at /Users/kjl/.vscode/extensions/julialang.language-julia-1.47.2/scripts/packages/VSCodeServer/src/trees.jl:269
unknown function (ip: 0x12fef4073)
ijl_apply_generic at /Users/kjl/.julia/juliaup/julia-1.9.1+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
dispatch_msg at /Users/kjl/.vscode/extensions/julialang.language-julia-1.47.2/scripts/packages/JSONRPC/src/typed.jl:67
ijl_apply_generic at /Users/kjl/.julia/juliaup/julia-1.9.1+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
dispatch_msg at /Users/kjl/.vscode/extensions/julialang.language-julia-1.47.2/scripts/packages/VSCodeServer/src/VSCodeServer.jl:100
unknown function (ip: 0x12fe8c213)
ijl_apply_generic at /Users/kjl/.julia/juliaup/julia-1.9.1+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
macro expansion at /Users/kjl/.vscode/extensions/julialang.language-julia-1.47.2/scripts/packages/VSCodeServer/src/VSCodeServer.jl:148 [inlined]
macro expansion at ./task.jl:476 [inlined]
macro expansion at /Users/kjl/.vscode/extensions/julialang.language-julia-1.47.2/scripts/packages/VSCodeServer/src/VSCodeServer.jl:142 [inlined]
#214 at ./task.jl:514
unknown function (ip: 0x12fa8da6f)
ijl_apply_generic at /Users/kjl/.julia/juliaup/julia-1.9.1+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
start_task at /Users/kjl/.julia/juliaup/julia-1.9.1+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
Allocations: 102802613 (Pool: 102747711; Big: 54902); GC: 159
zsh: segmentation fault  julia --threads auto

Julia was installed using juliaup for both computers. I can't say I have a firm grasp on the inner workings of Julia to make sense of this, so any guidance would be greatly appreciated.

gbaraldi commented 1 year ago

It seems the error is the same on both cases, do you have at least an open source reproducer? It doesn't need to be minimal to start with.

keithjlee commented 1 year ago

Yes, the repository is here, and an example file can be found here.

Again, the first time running the script will be fine, but for example: modifying the values in the meta parameters defined at the top, or rerunning the optimization multiple times will inevitably lead to the same error.

Appreciate the help.

keithjlee commented 1 year ago

I've tested a different optimization package (Nonconvex.jl), with similar fatal errors after multiple optimization runs, however, with a different error message:

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x7ffb14fd76a2 -- jl_object_id__cold at C:/workdir/src\builtins.c:417
in expression starting at none:0
jl_object_id__cold at C:/workdir/src\builtins.c:417
type_hash at C:/workdir/src\jltypes.c:1332
typekey_hash at C:/workdir/src\jltypes.c:1344
jl_precompute_memoized_dt at C:/workdir/src\jltypes.c:1409
inst_datatype_inner at C:/workdir/src\jltypes.c:1731
jl_inst_arg_tuple_type at C:/workdir/src\jltypes.c:1826
arg_type_tuple at C:/workdir/src\gf.c:2100 [inlined]
jl_lookup_generic_ at C:/workdir/src\gf.c:2884
ijl_apply_generic at C:/workdir/src\gf.c:2936
getvariables at c:\Users\keithjl\.vscode\extensions\julialang.language-julia-1.47.2\scripts\packages\VSCodeServer\src\trees.jl:295
unknown function (ip: 0000026bdc7246c8)
jl_apply at C:/workdir/src\julia.h:1879 [inlined]
jl_f__call_latest at C:/workdir/src\builtins.c:774
#invokelatest#2 at .\essentials.jl:816 [inlined]
invokelatest at .\essentials.jl:813 [inlined]
repl_getvariables_request at c:\Users\keithjl\.vscode\extensions\julialang.language-julia-1.47.2\scripts\packages\VSCodeServer\src\trees.jl:269
unknown function (ip: 0000026bdc6d685a)
dispatch_msg at c:\Users\keithjl\.vscode\extensions\julialang.language-julia-1.47.2\scripts\packages\JSONRPC\src\typed.jl:67
dispatch_msg at c:\Users\keithjl\.vscode\extensions\julialang.language-julia-1.47.2\scripts\packages\VSCodeServer\src\VSCodeServer.jl:100
unknown function (ip: 0000026bdc6d3f99)
macro expansion at c:\Users\keithjl\.vscode\extensions\julialang.language-julia-1.47.2\scripts\packages\VSCodeServer\src\VSCodeServer.jl:148 [inlined]
macro expansion at .\task.jl:476 [inlined]
macro expansion at c:\Users\keithjl\.vscode\extensions\julialang.language-julia-1.47.2\scripts\packages\VSCodeServer\src\VSCodeServer.jl:142 [inlined]
#214 at .\task.jl:514
unknown function (ip: 0000026bdc6d02f3)
jl_apply at C:/workdir/src\julia.h:1879 [inlined]
start_task at C:/workdir/src\task.c:1092
Allocations: 160804215 (Pool: 160725627; Big: 78588); GC: 228
PaioPaio commented 9 months ago

I get a very similar error message.

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x7fff73ce76ef -- jl_object_id__cold at C:/workdir/src\builtins.c:417
in expression starting at none:0
jl_object_id__cold at C:/workdir/src\builtins.c:417
type_hash at C:/workdir/src\jltypes.c:1332
typekey_hash at C:/workdir/src\jltypes.c:1344
jl_precompute_memoized_dt at C:/workdir/src\jltypes.c:1409
inst_datatype_inner at C:/workdir/src\jltypes.c:1731
jl_inst_arg_tuple_type at C:/workdir/src\jltypes.c:1826
arg_type_tuple at C:/workdir/src\gf.c:2100 [inlined]
jl_lookup_generic_ at C:/workdir/src\gf.c:2884
ijl_apply_generic at C:/workdir/src\gf.c:2936
getvariables at c:\Users\lpaiola\.vscode\extensions\julialang.language-julia-1.65.2\scripts\packages\VSCodeServer\src\trees.jl:295
unknown function (ip: 00000206ba471e28)
jl_apply at C:/workdir/src\julia.h:1880 [inlined]
jl_f__call_latest at C:/workdir/src\builtins.c:774
#invokelatest#2 at .\essentials.jl:819 [inlined]
invokelatest at .\essentials.jl:816 [inlined]
repl_getvariables_request at c:\Users\lpaiola\.vscode\extensions\julialang.language-julia-1.65.2\scripts\packages\VSCodeServer\src\trees.jl:269
unknown function (ip: 00000206ba3f646a)
dispatch_msg at c:\Users\lpaiola\.vscode\extensions\julialang.language-julia-1.65.2\scripts\packages\JSONRPC\src\typed.jl:67
dispatch_msg at c:\Users\lpaiola\.vscode\extensions\julialang.language-julia-1.65.2\scripts\packages\VSCodeServer\src\VSCodeServer.jl:100
unknown function (ip: 00000206ba3f3ba9)
macro expansion at c:\Users\lpaiola\.vscode\extensions\julialang.language-julia-1.65.2\scripts\packages\VSCodeServer\src\VSCodeServer.jl:151 [inlined]
macro expansion at .\task.jl:476 [inlined]
macro expansion at c:\Users\lpaiola\.vscode\extensions\julialang.language-julia-1.65.2\scripts\packages\VSCodeServer\src\VSCodeServer.jl:145 [inlined]
#224 at .\task.jl:514
unknown function (ip: 00000206ba3efe13)
jl_apply at C:/workdir/src\julia.h:1880 [inlined]
start_task at C:/workdir/src\task.c:1092
Allocations: 1535351292 (Pool: 1535310202; Big: 41090); GC: 2641

with versioninfo()

Julia Version 1.9.4
Commit 8e5136fa29 (2023-11-14 08:46 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 20 × 12th Gen Intel(R) Core(TM) i7-12700H
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, alderlake)
  Threads: 1 on 20 virtual cores

My usecase is similar aswell, I have a very not MWE code which repeatedly solves a nonlinear equation with SimpleNonlinearSolve.jl and whenever I run the code in the REPL too many times the process crashes. This error appears both in Windows (the error I posted) and on Manjaro stable. @keithjlee did you manage to find a workaround ?

keithjlee commented 8 months ago

Sorry for the late response.

I've yet to find a solution for this. Same problem occurs in 1.10, but (maybe?) less often?

Like you, a MWE is very difficult to put together, and because of the random nature of the error, it's hard to even deduce whether this will occur or not in a given script.