Open jiahao opened 4 years ago
@quinnj this example only segfaults with JSON3; JSON is fine.
This is triggered by splatting a large number of arguments (12608) with several different types. It looks like JSON3 uses a greater variety of types, which explains the difference.
We should probably detect this case (there is an assertion for it, in fact) and throw a StackOverflowError, which would be better than crashing (probably?).
I get the same problems even if splatted types are homogenous.
Here is the code using DataFrames.jl 0.21.2:
using DataFrames
df = DataFrame(rand(100, 1000))
select(df, All() => ByRow((x...) -> sum(x)))
Not going into details the core thing is that the last line internally splats 1000 elements in a function call (but they are homogenous - all are Float64
).
Now on Julia nightly when I try to run it I get on Linux:
julia> select(df, All() => ByRow((x...) -> sum(x)))
Killed
and on Julia 1.4.2 there are many pages of errors ending with:
jfptr_typeinf_ext_1.clone_1 at /home/bkamins/julia/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2145 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1700 [inlined]
jl_type_infer at /buildworker/worker/package_linux64/build/src/gf.c:213
jl_compile_method_internal at /buildworker/worker/package_linux64/build/src/gf.c:1888
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2154 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
materialize at ./broadcast.jl:820 [inlined]
ByRow at /home/bkamins/.julia/packages/DataFrames/kwVTY/src/abstractdataframe/selection.jl:30
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2159 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1700 [inlined]
do_apply at /buildworker/worker/package_linux64/build/src/builtins.c:643
select_transform! at /home/bkamins/.julia/packages/DataFrames/kwVTY/src/abstractdataframe/selection.jl:168
_manipulate at /home/bkamins/.julia/packages/DataFrames/kwVTY/src/abstractdataframe/selection.jl:661
unknown function (ip: 0x7f7859b2a865)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2159 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
#manipulate#301 at /home/bkamins/.julia/packages/DataFrames/kwVTY/src/abstractdataframe/selection.jl:564
manipulate##kw at /home/bkamins/.julia/packages/DataFrames/kwVTY/src/abstractdataframe/selection.jl:556 [inlined]
#select#296 at /home/bkamins/.julia/packages/DataFrames/kwVTY/src/abstractdataframe/selection.jl:491 [inlined]
select at /home/bkamins/.julia/packages/DataFrames/kwVTY/src/abstractdataframe/selection.jl:491
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2159 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1700 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:369
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:458
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:409 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:817
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:911
top-level scope at REPL[3]:1
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:819
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:769
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:848
eval at ./boot.jl:331
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2145 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
eval_user_input at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:86
macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:118 [inlined]
#26 at ./task.jl:358
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2145 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1700 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:687
unknown function (ip: (nil))
and the terminal stalls.
I have the following function:
import Statistics: quantile
function quantize_vec(
x::AbstractVector,
n_bins::Int
)
cutpoints = quantile(x, range(0, 1, length=n_bins+1))
binned = [searchsortedlast(cutpoints, xi) for xi in x]
return binned, cutpoints
end
and I'd like to do the following to apply it to each column of a matrix X
:
binned, cutpoints = zip([quantize_vec(x, n_bins) for x in eachcol(X)]...)
followed by hcat(binned...)
and hcat(cutpoints...)
.
However, I'm running into this same issue when X
has more than like 200 columns! What's the idiomatic way to accomplish this?
obviously one can preallocate and fill in:
n,p = size(X)
binned = zeros(Int, size(X))
cutpoints = zeros(n_bins+1, p)
for j in 1:p
binned[:,j], cutpoints[:,j] = quantize_vec(X[:,j], n_bins)
end
but I'm curious about the case where we might not be willing to do that.
Input file to the script (zipped): mre.json.zip
On release Julia 1.4, the segfault I get comes with the error message