Closed NLaws closed 3 years ago
On the SUSE servers, since we are using a Julia system image(?) the traceback is different:
python3: /buildworker/worker/package_linux64/build/src/codegen.cpp:3322: jl_cgval_t emit_invoke(jl_codectx_t&, jl_expr_t*, jl_value_t*): Assertion `(((jl_value_t*)(((jl_taggedvalue_t*)((char*)(mi) - sizeof(jl_taggedvalue_t)))->header & ~(uintptr_t)15))==(jl_value_t*)(jl_method_instance_type))' failed.
signal (6): Aborted
in expression starting at none:0
gsignal at /lib64/libc.so.6 (unknown line)
abort at /lib64/libc.so.6 (unknown line)
__assert_fail_base at /lib64/libc.so.6 (unknown line)
__assert_fail at /lib64/libc.so.6 (unknown line)
emit_invoke at /buildworker/worker/package_linux64/build/src/codegen.cpp:3322
emit_expr at /buildworker/worker/package_linux64/build/src/codegen.cpp:4139
emit_ssaval_assign at /buildworker/worker/package_linux64/build/src/codegen.cpp:3851
emit_stmtpos at /buildworker/worker/package_linux64/build/src/codegen.cpp:4044 [inlined]
emit_function at /buildworker/worker/package_linux64/build/src/codegen.cpp:6671
jl_compile_linfo at /buildworker/worker/package_linux64/build/src/codegen.cpp:1257
jl_compile_method_internal at /buildworker/worker/package_linux64/build/src/gf.c:1890
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2154 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
map at ./abstractarray.jl:2098
container at /home/deploy/.julia/packages/JuMP/YXK4e/src/Containers/container.jl:85
container at /home/deploy/.julia/packages/JuMP/YXK4e/src/Containers/container.jl:65
unknown function (ip: 0x7f7aa6c43a68)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2145 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
macro expansion at /home/deploy/.julia/packages/JuMP/YXK4e/src/macros.jl:79 [inlined]
add_tech_size_constraints at /srv/data/apps/reopt_api/main/releases/20210114225712/reo/src/reopt.jl:478
reopt_run at /srv/data/apps/reopt_api/main/releases/20210114225712/reo/src/reopt.jl:858
reopt at /srv/data/apps/reopt_api/main/releases/20210114225712/reo/src/reopt.jl:812
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2159 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1700 [inlined]
do_apply at /buildworker/worker/package_linux64/build/src/builtins.c:643
jl_f__apply_latest at /buildworker/worker/package_linux64/build/src/builtins.c:693
#invokelatest#1 at ./essentials.jl:712
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2145 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1700 [inlined]
do_apply at /buildworker/worker/package_linux64/build/src/builtins.c:643
invokelatest at ./essentials.jl:711
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2145 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1700 [inlined]
do_apply at /buildworker/worker/package_linux64/build/src/builtins.c:643
_pyjlwrap_call at /home/deploy/.julia/packages/PyCall/zqDXB/src/callback.jl:28
unknown function (ip: 0x7f7aa6bca302)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2145 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
pyjlwrap_call at /home/deploy/.julia/packages/PyCall/zqDXB/src/callback.jl:49
jfptr_pyjlwrap_call_31754 at /srv/data/apps/reopt_api/main/releases/20210114225712/julia_envs/Xpress/JuliaXpressSysimage.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2145 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
jlcapi_pyjlwrap_call_31669 at /srv/data/apps/reopt_api/main/releases/20210114225712/julia_envs/Xpress/JuliaXpressSysimage.so (unknown line)
_PyObject_FastCallDict at /lib64/libpython3.6m.so.1.0 (unknown line)
unknown function (ip: 0x7f7ade0c2d5b)
_PyEval_EvalFrameDefault at /lib64/libpython3.6m.so.1.0 (unknown line)
unknown function (ip: 0x7f7ade0c1d76)
_PyFunction_FastCallDict at /lib64/libpython3.6m.so.1.0 (unknown line)
_PyObject_FastCallDict at /lib64/libpython3.6m.so.1.0 (unknown line)
_PyObject_Call_Prepend at /lib64/libpython3.6m.so.1.0 (unknown line)
PyObject_Call at /lib64/libpython3.6m.so.1.0 (unknown line)
_PyEval_EvalFrameDefault at /lib64/libpython3.6m.so.1.0 (unknown line)
unknown function (ip: 0x7f7ade0c1fa0)
_PyFunction_FastCallDict at /lib64/libpython3.6m.so.1.0 (unknown line)
_PyObject_FastCallDict at /lib64/libpython3.6m.so.1.0 (unknown line)
...
Addressed in production on March 17th with move to Rancher cluster. Fixes are in https://github.com/NREL/REopt_Lite_API/pull/198
@NLaws Excited to see this change! Seems like the best way forward. PyCall/pyJulia always seemed troublesome to work with.
Some jobs never solve and it appears to be due to lost workers, probably related to a Julia bug. The traceback on a local host is:
We see something similar on the SUSE Linux servers.
The failing line in Julia matches https://github.com/JuliaLang/julia/issues/37694, which might be related to https://github.com/JuliaLang/julia/issues/35580. The latter issue appears to be fixed in Julia 1.6.0-DEV.1399, so hopefully we can fix this as soon as 1.6 is released.