JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.63k stars 5.48k forks source link

Result of `Module()` is not globally rooted. #28917

Open HildingElmqvist opened 6 years ago

HildingElmqvist commented 6 years ago

We frequently get gc errors when using Modia and ModiaMath in Julia 0.7 and 1.0. It works fine in Julia 0.6.4.

A simple model which reproduces the error is:

]add ModiaMath
]add Modia#master

import Modia

@model FirstOrder begin
             x = Variable(start=1)   # start means x(0)
             T = Parameter(0.5)      # Time constant
             u = 2.0                 # Same as Parameter(2.0)
@equations begin
             T*der(x) + x = u        # der() means time derivative
             end
end;

for i in 1: 1000; println(i); result = simulate(FirstOrder, 2); end

After a while, the error occurs:

...    
192

    Simulating model: FirstOrder
    Number of equations: 1
    Number of variables: 2
    Number of continuous states: 1
    GC error (probable corruption) :
    Allocations: 628517437 (Pool: 628396240; Big: 121197); GC: 1429
    <?#000000001F8BC4E0::<?#000000001F8BC5B0::0000000000000000>>
    000000002B9D4040: Queued root: 0000000021CBC9D0 :: 000000000E8E1040 (bits: 3)
            of type Array{Any, 1}
    ...

So it shows that there is nothing fancy with the model (events or so) to cause the gc error.

The C-interface to Sundials in ModiaMath has been updated to accomodate changes when moving to version 0.7. Of course there could be some error, which we have not been able to find.

yuyichao commented 6 years ago

Since there's C involved and given that I've just seen many people fixing the warnings/errors by making the code more wrong this is most likely a user error. Please use discourse for questions and please include more detail about the error/warnings you suppressed when you do so.

KristofferC commented 6 years ago

If you explicitly pass pointers to C functions remember to GC.@preserve the object so the underlying memory won't be freed before the call. That has d been a common source of problem for people upgrading their C-interface code.

StefanKarpinski commented 6 years ago

In 0.7/1.0 the optimizer has gotten considerably better at knowing when it can reclaim memory and as a result, many things that used to work when passing something to C have started to segfault. They were always incorrect and should have had a root, but you could get away with it much more easily before.

MartinOtter commented 6 years ago

By the analysis made below, it seems that the reason is a bug in Julia 1.0.0 related to the evaluate(..) function. Therefore, please re-open this issue:

A Modia-branch find-reason-for-gc-crash was made. In this branch statements are removed to locate the reason for the gc-crash. In particular, all code of ModiaMath (= interface to C-code of Sundials IDA) and of PyPlot were removed. This means ModiaMath and PyPlot cannot be the reason of the crash and therefore it is unlikely that a wrong C-interface call is the reason.

The minimum package so far to get a Julia crash can be reproduced with the following statements:

# Start Julia 1.0.0
]add Modia#find-reason-for-gc-crash
import Modia
include("$(Modia.ModiaDir)/test/Test_gc_crash.jl")

The script "Test_gc_crash.jl" uses macros to construct a very simple function F and then has the following (newly introduced) statement in file Modia/src/language/Execution.jl, line: 576 (instead of calling the ModiaMath.simulate!(..) function):

println("... typeof(F) = ", typeof(F))

With "Test_gc_crash.jl" this construction process is performed 1000-times. At an arbitrary iteration number Julia crashes at the print statement above with the message:

#simulate_ida#12 at HOME\.julia\packages\Modia\xH8b2\src\language\Execution.jl:576
unknown function (ip: 0000000025E670F6)

The complete crash-log is available here

The reason for this crash, seems to be the following statement in file Modia/src/language/Execution.jl, line: 505:

    F = evaluate(Module(), F_code)

Here, with a macro the code "F_code" is generated and with "evaluate(..)" this F_code is evaluated in order that function F() can be called afterwards. When showing the code of F_code (by setting const showCode = true in line 33 of Execution.jl):

FUNCTION F CODE
F_code = quote
    #= HOME\.julia\dev\Modia\src\language\Execution.jl:494 =#
    let
        #= HOME.julia\dev\Modia\src\language\Execution.jl:495 =#
        function F_FirstOrder(##simulationModel#359, _t, ##x#419, ##der_x#420, ##r#421, _w)
            #= HOME\.julia\dev\Modia\src\language\Execution.jl:481 =#
            global x = ##x#419[1]
            der(x) = ##der_x#420[1]
            ##time#358 = ((##simulationModel#359).simulationState).time
            ##residual#418 = 2.0 - (+)((*)(0.5, der(x)), x)
            ##r#421[1] = ##residual#418
            nothing
        end
    end
end

Nothing fancy seems to be here.

Note, no crash occurs when running Test_gc_crash.jl with Julia 0.6.4.

When introducing a print statement for typeof(F) directly after the generation of F, so:

    F = evaluate(Module(), F_code)
    println("... typeof(F) = ", typeof(F))
    println("... F = ", F)

then also no crash occurs in Julia 1.0.0.

The analysis above seems to indicate, that the crash is due to a bug in Julia 1.0.0.

Note, when adding these two print-statements to Modia@master, the gc-crash still occurs,

(notification of this issue to: @toivoh, @tshort, @crlaugh, @ChrisRackauckas)

tshort commented 6 years ago

I can replicate. I check with lsof and confirm that no shared libraries from packages are involved.

StefanKarpinski commented 6 years ago

Thanks for the thorough investigation and work to narrow this down.

HildingElmqvist commented 6 years ago

I am working on further narrowing down the problem. The problem is appearing without instantiation (maco handling), structural or symbolic processing, i.e. with an F_code function without equations.

However, I then thought it might be a good idea to first check with Julia nightly build assuming the/a gc bug has been fixed.

However, I got: ERROR: LoadError: LoadError: UndefVarError: @doc_str not defined

Do you (@MartinOtter, @tshort, @crlaugh) understand how to fix this?

HildingElmqvist commented 6 years ago

I have now isolated the problem to 60 lines of code (just include it): ExecutionShowingGCProblem.jl.txt

I have marked certain lines with "No crash ..." indicating my failed attempts to reduce the code even more still giving the crash.

In addition to julia 1.0, I tried Julia Version 1.1.0-DEV.69 (2018-08-20) (nightly build) which gave the log: ExecutionShowingGCProblem.log

@StefanKarpinski, I hope someone in the Julia team would have a look.

tshort commented 6 years ago

Here's @HildingElmqvist's code pasted inline to ease review. It's an interesting combination of things needed to cause the crash.

module Execution

using DataStructures: OrderedDict

function prepare_ida()
    initial_ex = :(let; end)
    initial_body = (initial_ex.args[2].args)::Vector{Any}

    residuals = Symbol[]
    eliminated = Symbol[]
    push!(initial_body, :(Any[$(residuals...)], Any[$(eliminated...)]))

    initial_residuals, initial_eliminated = eval(initial_ex)
#    initial_eliminated = Any[]  # No crash

    name = Symbol("F_", "Dummy")
#    F_code = :( function $(name)($(first_F_args...), _t, $x, $der_x, $r, _w) # Original
    F_code = :( function $(name)(x)
#    F_code = :( function $(name)()   # No crash
                  end )
    F_body = (F_code.args[2].args)::Vector{Any}

    F_code = quote
        let
            $(F_code)
        end
    end

    println("eval")
    F = Core.eval(Module(), F_code)

    #println("... typeof(F) = ", typeof(F)) # No crash if enabled
    #println("... F = ", F)

    eliminated_Ts = OrderedDict{Symbol,Type}()
#    eliminated_Ts = Dict() # No crash

    for i in 1:1000
        for (name, value) in zip(eliminated, initial_eliminated)
#            eliminated_Ts[name] = typeof(value)
        end
    end

    println("No crash yet")
    return F
end

# -----------------------------

function simulate_ida()
    println("... simulate_ida start")

    for i in 1:10000
        @show i
        F = prepare_ida()

        println("... typeof(F) = ", typeof(F))  # No crash if removed
        println("... F = ", F)
    end
end

simulate_ida()

end
tshort commented 6 years ago

This issue could also use a new title because it is not specific to Modia. I'm not sure what the title should be because the root cause of the GC issue is yet to be determined. Maybe just "Garbage collection error with evals and other operations".

StefanKarpinski commented 6 years ago

Does it still happen on master? @Keno has fixed a bunch of GC rooting issues recently.

KristofferC commented 6 years ago

Yes.

tshort commented 6 years ago

Here's a cut-down version of @HildingElmqvist's example. This time, it's no longer a GC error--it's a segmentation fault.

using DataStructures: OrderedDict

function prepare_ida()
    initial_ex = :(let; end)
    initial_body = (initial_ex.args[2].args)::Vector{Any}

    residuals = Symbol[]
    eliminated = Symbol[]
    push!(initial_body, :(Any[$(residuals...)], Any[$(eliminated...)]))
    initial_residuals, initial_eliminated = Core.eval(Module(), initial_ex)

    name = Symbol("F_", "Dummy")
    F_code = :( function $(name)(x)
                  end )

    println("eval")
    F = Core.eval(Module(), F_code)

    @show eliminated
    @show initial_eliminated
    z = zip(eliminated, initial_eliminated)

    println("No crash yet")
    println("... typeof(F) = ", typeof(F))  # No crash if removed
end
prepare_ida()

Here's the error.

eval
eliminated = Symbol[]
initial_eliminated = Any[]
No crash yet

signal (11): Segmentation fault
in expression starting at /home/tshort/tmp/j2/exec.jl:26
valid_type_param at /buildworker/worker/package_linux64/build/src/builtins.c:878 [inlined]
jl_f_apply_type at /buildworker/worker/package_linux64/build/src/builtins.c:909
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2182
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1538 [inlined]
jl_f__apply at /buildworker/worker/package_linux64/build/src/builtins.c:563
get_argtypes at ./compiler/inferenceresult.jl:63
get_argtypes at ./compiler/inferenceresult.jl:25 [inlined]
Type at ./compiler/inferencestate.jl:62
Type at ./compiler/inferencestate.jl:120 [inlined]
typeinf_ext at ./compiler/typeinfer.jl:565
typeinf_ext at ./compiler/typeinfer.jl:604
jfptr_typeinf_ext_1 at /home/tshort/.julia_versions/julia-0.7.0/lib/julia/sys.so (unknown line)
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2182
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1538 [inlined]
jl_apply_with_saved_exception_state at /buildworker/worker/package_linux64/build/src/rtutils.c:257
jl_type_infer at /buildworker/worker/package_linux64/build/src/gf.c:275
jl_compile_method_internal at /buildworker/worker/package_linux64/build/src/gf.c:1784 [inlined]
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1828
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2182
prepare_ida at /home/tshort/tmp/j2/exec.jl:21
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1829
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2182
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:324
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:428
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:363 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:686
jl_interpret_toplevel_thunk_callback at /buildworker/worker/package_linux64/build/src/interpreter.c:799
unknown function (ip: 0xfffffffffffffffe)
unknown function (ip: 0x7f659b20918f)
unknown function (ip: 0xffffffffffffffff)
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:808
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:831
jl_parse_eval_all at /buildworker/worker/package_linux64/build/src/ast.c:841
jl_load at /buildworker/worker/package_linux64/build/src/toplevel.c:865
include at ./boot.jl:317 [inlined]
include_relative at ./loading.jl:1038
include at ./sysimg.jl:29
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2182
exec_options at ./client.jl:239
_start at ./client.jl:432
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2182
unknown function (ip: 0x401af8)
unknown function (ip: 0x401523)
__libc_start_main at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x4015c4)
Allocations: 826355 (Pool: 826182; Big: 173); GC: 1
Segmentation fault (core dumped)
Keno commented 6 years ago

So what's happening there is that the module that defined typeof(F) gets gc'ed, which a lot of the code assumes can't happen.

JeffBezanson commented 6 years ago

A workaround for now would be to create a global array and push each Module() you create into it. (We might fix it internally this way as well.)

MartinOtter commented 6 years ago

typeof(F) was not in the original Julia program where the gc-crash occured. It was introduced when Modia was decoupled from ModiaMath for isolation of the gc problem. I checked and the crash occurs also for the more realistic case where function F(..) is just called:

Modified code from Hilding: ExecutionShowingGCProblem2.txt

Modified code from Tom:

using DataStructures: OrderedDict

function prepare_ida()
    initial_ex = :(let; end)
    initial_body = (initial_ex.args[2].args)::Vector{Any}

    residuals = Symbol[]
    eliminated = Symbol[]
    push!(initial_body, :(Any[$(residuals...)], Any[$(eliminated...)]))
    initial_residuals, initial_eliminated = Core.eval(Module(), initial_ex)

    name = Symbol("F_", "Dummy")
    F_code = :( function $(name)(x)
                  end )

    println("eval")
    F = Core.eval(Module(), F_code)

    @show eliminated
    @show initial_eliminated
    z = zip(eliminated, initial_eliminated)

    println("No crash yet")
    # println("... typeof(F) = ", typeof(F))  # No crash if removed
    println("... calling F")
    x = [1.0]
    Base.invokelatest(F,x)
end
prepare_ida()

eval
eliminated = Symbol[]
initial_eliminated = Any[]
No crash yet
... calling F

Internal error: encountered unexpected error in runtime:
MethodError(f=typeof(Core.Compiler.copy_code_info)(), args=(<?#000000000E8AF710::
Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x6b5ffdc7 -- jl_is_cpointer_type at /home/Administrator/buildbot/worker/package_win64/build/src\julia.h:1027 [inlined]
jl_static_show_x_ at /home/Administrator/buildbot/worker/package_win64/build/src\rtutils.c:664
in expression starting at no file:0
jl_static_show_x_ at /home/Administrator/buildbot/worker/package_win64/build/src\rtutils.c:661
jl_static_show_x at /home/Administrator/buildbot/worker/package_win64/build/src\rtutils.c:953 [inlined]
jl_static_show_x_ at /home/Administrator/buildbot/worker/package_win64/build/src\rtutils.c:930
jl_static_show_x at /home/Administrator/buildbot/worker/package_win64/build/src\rtutils.c:953 [inlined]
jl_static_show_x_ at /home/Administrator/buildbot/worker/package_win64/build/src\rtutils.c:906
jl_static_show_x at /home/Administrator/buildbot/worker/package_win64/build/src\rtutils.c:953 [inlined]
jl_static_show_x_ at /home/Administrator/buildbot/worker/package_win64/build/src\rtutils.c:906
jl_static_show_x at /home/Administrator/buildbot/worker/package_win64/build/src\rtutils.c:953 [inlined]
jl_static_show at /home/Administrator/buildbot/worker/package_win64/build/src\rtutils.c:958 [inlined]
jl_apply_with_saved_exception_state at /home/Administrator/buildbot/worker/package_win64/build/src\rtutils.c:262
jl_type_infer at /home/Administrator/buildbot/worker/package_win64/build/src\gf.c:275
jl_compile_method_internal at /home/Administrator/buildbot/worker/package_win64/build/src\gf.c:1784 [inlined]
jl_fptr_trampoline at /home/Administrator/buildbot/worker/package_win64/build/src\gf.c:1828
jl_apply_generic at /home/Administrator/buildbot/worker/package_win64/build/src\gf.c:2182
jl_apply at /home/Administrator/buildbot/worker/package_win64/build/src\julia.h:1536 [inlined]
jl_f__apply at /home/Administrator/buildbot/worker/package_win64/build/src\builtins.c:556
jl_f__apply_latest at /home/Administrator/buildbot/worker/package_win64/build/src\builtins.c:594
#invokelatest#1 at .\essentials.jl:686 [inlined]
invokelatest at .\essentials.jl:685 [inlined]
prepare_ida at .\REPL[2]:24
jl_fptr_trampoline at /home/Administrator/buildbot/worker/package_win64/build/src\gf.c:1829
jl_apply_generic at /home/Administrator/buildbot/worker/package_win64/build/src\gf.c:2182
do_call at /home/Administrator/buildbot/worker/package_win64/build/src\interpreter.c:324
eval_value at /home/Administrator/buildbot/worker/package_win64/build/src\interpreter.c:428
eval_stmt_value at /home/Administrator/buildbot/worker/package_win64/build/src\interpreter.c:363 [inlined]
eval_body at /home/Administrator/buildbot/worker/package_win64/build/src\interpreter.c:682
jl_interpret_toplevel_thunk_callback at /home/Administrator/buildbot/worker/package_win64/build/src\interpreter.c:799
unknown function (ip: FFFFFFFFFFFFFFFE)
unknown function (ip: 000000000D56441F)
unknown function (ip: FFFFFFFFFFFFFFFF)
jl_toplevel_eval_flex at /home/Administrator/buildbot/worker/package_win64/build/src\toplevel.c:787
jl_toplevel_eval_in at /home/Administrator/buildbot/worker/package_win64/build/src\builtins.c:622
eval at .\boot.jl:319
jl_apply_generic at /home/Administrator/buildbot/worker/package_win64/build/src\gf.c:2182
eval_user_input at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\REPL\src\REPL.jl:85
macro expansion at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\REPL\src\REPL.jl:117 [inlined]
#28 at .\task.jl:259
jl_fptr_trampoline at /home/Administrator/buildbot/worker/package_win64/build/src\gf.c:1829
jl_apply_generic at /home/Administrator/buildbot/worker/package_win64/build/src\gf.c:2182
jl_apply at /home/Administrator/buildbot/worker/package_win64/build/src\julia.h:1536 [inlined]
start_task at /home/Administrator/buildbot/worker/package_win64/build/src\task.c:268
Allocations: 3139049 (Pool: 3138538; Big: 511); GC: 6

The two Julia programs above give a crash in Julia 1.0.0. Unfortunately, I cannot test with your newest Julia version because it seems not to be on the downloads page (the nightly build on the downloads page is 1.1.0-DEV.69 (2018-08-20)), and I do not know how to build Julia myself from github.

KristofferC commented 6 years ago

The cause seems to be the same and as Jeff said, the workaround is to make sure the module does not get garbage collected, like:

using DataStructures: OrderedDict
const MODULES = Module[]

function prepare_ida()
    initial_ex = :(let; end)
    initial_body = (initial_ex.args[2].args)::Vector{Any}

    residuals = Symbol[]
    eliminated = Symbol[]
    push!(initial_body, :(Any[$(residuals...)], Any[$(eliminated...)]))
    initial_residuals, initial_eliminated = Core.eval(Module(), initial_ex)

    name = Symbol("F_", "Dummy")
    F_code = :( function $(name)(x)
                  end )

    println("eval")
    push!(MODULES, Module())
    F = Core.eval(MODULES[end], F_code)

    @show eliminated
    @show initial_eliminated
    z = zip(eliminated, initial_eliminated)

    println("No crash yet")
    # println("... typeof(F) = ", typeof(F))  # No crash if removed
    println("... calling F")
    x = [1.0]
    Base.invokelatest(F,x)
end
prepare_ida()
MartinOtter commented 6 years ago

Thanks very much. You proposed workaround fixes the gc-crash in Modia. Therefore, this issue can be closed.

KristofferC commented 6 years ago

Even though a workaround exists, this issue should probably still be open until it is fixed in Base.