JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.86k stars 5.49k forks source link

Segfault in with_tvar() in subtyping #35269

Open jiahao opened 4 years ago

jiahao commented 4 years ago
using JSON3

function readlabels(l)
    if :descriptor in keys(l)
        name = l[:descriptor][:name]
        ui = l[:descriptor][:UI]
        q = l[:qualifier]
        if q !== nothing && :name in keys(q)
            return (name, ui, q[:name], q[:UI])
        else
            return (name, ui)
        end
    else
        return missing
    end
end

datafile = "mre.json"
data = open(datafile, "r") do f
    JSON3.read(f)
end

col_labels_by_row =[map(readlabels, x) for x in values(data)]
col_labels = unique!([col_labels_by_row...;])

Input file to the script (zipped): mre.json.zip

On release Julia 1.4, the segfault I get comes with the error message

signal (11): Segmentation fault: 11
in expression starting at /Users/jiahao/sandbox/mre.jl:24
with_tvar at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:722
subtype at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:0
subtype_tuple_tail at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:1087
with_tvar at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:702
subtype_tuple_tail at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:0
subtype_tuple at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:1169 [inlined]
subtype at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:1322
with_tvar at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:702
subtype at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:0
exists_subtype at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:1425 [inlined]
forall_exists_subtype at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:1453
jl_subtype_env at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:1818
jl_subtype at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:1854 [inlined]
intersect at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:2895
intersect_all at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:3047
jl_type_intersection_env_s at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:3241
jl_typemap_intersection_node_visitor at /Users/julia/buildbot/worker/package_macos64/build/src/typemap.c:484
ml_matches at /Users/julia/buildbot/worker/package_macos64/build/src/gf.c:2723
cache_method at /Users/julia/buildbot/worker/package_macos64/build/src/gf.c:956
jl_mt_assoc_by_type at /Users/julia/buildbot/worker/package_macos64/build/src/gf.c:1114
jl_lookup_generic_ at /Users/julia/buildbot/worker/package_macos64/build/src/gf.c:2289
jl_apply_generic at /Users/julia/buildbot/worker/package_macos64/build/src/gf.c:2318
jl_apply at /Users/julia/buildbot/worker/package_macos64/build/src/./julia.h:1692 [inlined]
do_apply at /Users/julia/buildbot/worker/package_macos64/build/src/builtins.c:643
vcat at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.4/SparseArrays/src/sparsevector.jl:1071
jl_apply at /Users/julia/buildbot/worker/package_macos64/build/src/./julia.h:1692 [inlined]
do_apply at /Users/julia/buildbot/worker/package_macos64/build/src/builtins.c:643
jl_apply at /Users/julia/buildbot/worker/package_macos64/build/src/./julia.h:1692 [inlined]
do_call at /Users/julia/buildbot/worker/package_macos64/build/src/interpreter.c:369
eval_body at /Users/julia/buildbot/worker/package_macos64/build/src/interpreter.c:0
jl_interpret_toplevel_thunk at /Users/julia/buildbot/worker/package_macos64/build/src/interpreter.c:911
jl_toplevel_eval_flex at /Users/julia/buildbot/worker/package_macos64/build/src/toplevel.c:814
jl_parse_eval_all at /Users/julia/buildbot/worker/package_macos64/build/src/ast.c:872
jl_load at /Users/julia/buildbot/worker/package_macos64/build/src/toplevel.c:872 [inlined]
jl_load_ at /Users/julia/buildbot/worker/package_macos64/build/src/toplevel.c:879
include at ./Base.jl:377
exec_options at ./client.jl:288
_start at ./client.jl:484
jfptr__start_2076.clone_1 at /Applications/Julia-1.4.app/Contents/Resources/julia/lib/julia/sys.dylib (unknown line)
true_main at /Applications/Julia-1.4.app/Contents/Resources/julia/bin/julia (unknown line)
main at /Applications/Julia-1.4.app/Contents/Resources/julia/bin/julia (unknown line)
Allocations: 33754773 (Pool: 33752995; Big: 1778); GC: 54
[1]    47666 segmentation fault  /Applications/Julia-1.4.app/Contents/Resources/julia/bin/julia mre.jl
jiahao commented 4 years ago

@quinnj this example only segfaults with JSON3; JSON is fine.

JeffBezanson commented 4 years ago

This is triggered by splatting a large number of arguments (12608) with several different types. It looks like JSON3 uses a greater variety of types, which explains the difference.

We should probably detect this case (there is an assertion for it, in fact) and throw a StackOverflowError, which would be better than crashing (probably?).

bkamins commented 4 years ago

I get the same problems even if splatted types are homogenous.

Here is the code using DataFrames.jl 0.21.2:

using DataFrames
df = DataFrame(rand(100, 1000))
select(df, All() => ByRow((x...) -> sum(x)))

Not going into details the core thing is that the last line internally splats 1000 elements in a function call (but they are homogenous - all are Float64).

Now on Julia nightly when I try to run it I get on Linux:

julia> select(df, All() => ByRow((x...) -> sum(x)))
Killed

and on Julia 1.4.2 there are many pages of errors ending with:

jfptr_typeinf_ext_1.clone_1 at /home/bkamins/julia/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2145 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1700 [inlined]
jl_type_infer at /buildworker/worker/package_linux64/build/src/gf.c:213
jl_compile_method_internal at /buildworker/worker/package_linux64/build/src/gf.c:1888
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2154 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
materialize at ./broadcast.jl:820 [inlined]
ByRow at /home/bkamins/.julia/packages/DataFrames/kwVTY/src/abstractdataframe/selection.jl:30
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2159 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1700 [inlined]
do_apply at /buildworker/worker/package_linux64/build/src/builtins.c:643
select_transform! at /home/bkamins/.julia/packages/DataFrames/kwVTY/src/abstractdataframe/selection.jl:168
_manipulate at /home/bkamins/.julia/packages/DataFrames/kwVTY/src/abstractdataframe/selection.jl:661
unknown function (ip: 0x7f7859b2a865)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2159 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
#manipulate#301 at /home/bkamins/.julia/packages/DataFrames/kwVTY/src/abstractdataframe/selection.jl:564
manipulate##kw at /home/bkamins/.julia/packages/DataFrames/kwVTY/src/abstractdataframe/selection.jl:556 [inlined]
#select#296 at /home/bkamins/.julia/packages/DataFrames/kwVTY/src/abstractdataframe/selection.jl:491 [inlined]
select at /home/bkamins/.julia/packages/DataFrames/kwVTY/src/abstractdataframe/selection.jl:491
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2159 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1700 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:369
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:458
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:409 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:817
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:911
top-level scope at REPL[3]:1
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:819
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:769
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:848
eval at ./boot.jl:331
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2145 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
eval_user_input at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:86
macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:118 [inlined]
#26 at ./task.jl:358
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2145 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1700 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:687
unknown function (ip: (nil))

and the terminal stalls.

alejandroschuler commented 2 years ago

I have the following function:

import Statistics: quantile

function quantize_vec(
    x::AbstractVector,
    n_bins::Int
)
    cutpoints = quantile(x, range(0, 1, length=n_bins+1))
    binned = [searchsortedlast(cutpoints, xi) for xi in x]
    return binned, cutpoints
end

and I'd like to do the following to apply it to each column of a matrix X:

binned, cutpoints = zip([quantize_vec(x, n_bins) for x in eachcol(X)]...)

followed by hcat(binned...) and hcat(cutpoints...).

However, I'm running into this same issue when X has more than like 200 columns! What's the idiomatic way to accomplish this?

alejandroschuler commented 2 years ago

obviously one can preallocate and fill in:

n,p = size(X)
binned = zeros(Int, size(X))
cutpoints = zeros(n_bins+1, p)
for j in 1:p
    binned[:,j], cutpoints[:,j] = quantize_vec(X[:,j], n_bins)
end

but I'm curious about the case where we might not be willing to do that.