JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.35k stars 5.46k forks source link

Unpredictable segfault on master when multithreading with many allocations #52032

Open MasonProtter opened 9 months ago

MasonProtter commented 9 months ago

I've got a crash that occurs on master and v1.10 stochastically when using Transducers.jl to collect a generator over a big array on many threads. I haven't been able to make a self contained reproducer without Transducers.jl, but it seems to not really be related to transducers itself, but an internal error brought on by changes to the GC.

Here's a reproducer. It demonstrates the problem for me on version 1.10-rc1, master, but not v1.9

julia> using Transducers

julia> xs = randn(10_000_000);

julia> tcollect(sin(x) for x in xs if abs(x) < 1);

julia> tcollect(sin(x) for x in xs if abs(x) < 1);

julia> tcollect(sin(x) for x in xs if abs(x) < 1);

julia> tcollect(sin(x) for x in xs if abs(x) < 1);

julia> tcollect(sin(x) for x in xs if abs(x) < 1);
[402472] signal (11.1): Segmentation fault
in expression starting at REPL[51]:1
jl_object_id__cold at /home/mason/julia-dev/src/builtins.c:455
type_hash at /home/mason/julia-dev/src/jltypes.c:1585
typekey_hash at /home/mason/julia-dev/src/jltypes.c:1615
jl_precompute_memoized_dt at /home/mason/julia-dev/src/jltypes.c:1695
inst_datatype_inner at /home/mason/julia-dev/src/jltypes.c:2124
jl_inst_arg_tuple_type at /home/mason/julia-dev/src/jltypes.c:2219
arg_type_tuple at /home/mason/julia-dev/src/gf.c:2240 [inlined]
jl_lookup_generic_ at /home/mason/julia-dev/src/gf.c:3047 [inlined]
ijl_apply_generic at /home/mason/julia-dev/src/gf.c:3099
_reduce at /home/mason/Dropbox/Julia/Transducers/src/reduce.jl:153
_transduce_assoc_nocomplete at /home/mason/Dropbox/Julia/Transducers/src/reduce.jl:131 [inlined]
#transduce_assoc#176 at /home/mason/Dropbox/Julia/Transducers/src/reduce.jl:108
transduce_assoc at /home/mason/Dropbox/Julia/Transducers/src/reduce.jl:84 [inlined]
#foldxt#184 at /home/mason/Dropbox/Julia/Transducers/src/reduce.jl:235 [inlined]
foldxt at /home/mason/Dropbox/Julia/Transducers/src/reduce.jl:235 [inlined]
#_tcopy#190 at /home/mason/Dropbox/Julia/Transducers/src/reduce.jl:344 [inlined]
_tcopy at /home/mason/Dropbox/Julia/Transducers/src/reduce.jl:344 [inlined]
#tcopy#189 at /home/mason/Dropbox/Julia/Transducers/src/reduce.jl:343 [inlined]
tcopy at /home/mason/Dropbox/Julia/Transducers/src/reduce.jl:343 [inlined]
#tcollect#199 at /home/mason/Dropbox/Julia/Transducers/src/reduce.jl:423 [inlined]
tcollect at /home/mason/Dropbox/Julia/Transducers/src/reduce.jl:423 [inlined]
#tcollect#200 at /home/mason/Dropbox/Julia/Transducers/src/reduce.jl:424 [inlined]
tcollect at /home/mason/Dropbox/Julia/Transducers/src/reduce.jl:424
unknown function (ip: 0x7f49e3182625)
jl_apply at /home/mason/julia-dev/src/julia.h:2130 [inlined]
do_call at /home/mason/julia-dev/src/interpreter.c:126
eval_value at /home/mason/julia-dev/src/interpreter.c:223
eval_stmt_value at /home/mason/julia-dev/src/interpreter.c:174 [inlined]
eval_body at /home/mason/julia-dev/src/interpreter.c:647
jl_interpret_toplevel_thunk at /home/mason/julia-dev/src/interpreter.c:787
jl_toplevel_eval_flex at /home/mason/julia-dev/src/toplevel.c:938
jl_toplevel_eval_flex at /home/mason/julia-dev/src/toplevel.c:881
jl_toplevel_eval_flex at /home/mason/julia-dev/src/toplevel.c:881
jl_toplevel_eval_flex at /home/mason/julia-dev/src/toplevel.c:881
jl_toplevel_eval_flex at /home/mason/julia-dev/src/toplevel.c:881
ijl_toplevel_eval_in at /home/mason/julia-dev/src/toplevel.c:989
eval at ./boot.jl:418 [inlined]
eval_user_input at /home/mason/julia-dev/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:167
repl_backend_loop at /home/mason/julia-dev/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:263
#start_repl_backend#48 at /home/mason/julia-dev/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:248
start_repl_backend at /home/mason/julia-dev/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:245
#run_repl#61 at /home/mason/julia-dev/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:404
run_repl at /home/mason/julia-dev/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:390
unknown function (ip: 0x7f49e31099e9)
#1078 at ./client.jl:441
unknown function (ip: 0x7f49e3102b45)
jl_apply at /home/mason/julia-dev/src/julia.h:2130 [inlined]
jl_f__call_latest at /home/mason/julia-dev/src/builtins.c:859
#invokelatest#2 at ./essentials.jl:929 [inlined]
invokelatest at ./essentials.jl:926 [inlined]
run_main_repl at ./client.jl:425
repl_main at ./client.jl:605 [inlined]
_start at ./client.jl:579
jfptr__start_67457 at /home/mason/julia-dev/usr/lib/julia/sys.so (unknown line)
jl_apply at /home/mason/julia-dev/src/julia.h:2130 [inlined]
true_main at /home/mason/julia-dev/src/jlapi.c:586
jl_repl_entrypoint at /home/mason/julia-dev/src/jlapi.c:738
main at /home/mason/julia-dev/cli/loader_exe.c:58
unknown function (ip: 0x7f4a18755ccf)
__libc_start_main at /usr/lib/libc.so.6 (unknown line)
_start at /home/mason/julia-dev/./julia (unknown line)
Allocations: 176209415 (Pool: 176208030; Big: 1385); GC: 74
oscardssmith commented 9 months ago

My guess is that it's probably https://github.com/JuliaLang/julia/issues/51852. Edit. if it happens on 1.10, it's not.

MasonProtter commented 9 months ago

My mistake, it's actually not triggering on v1.10, I think I must have been confused when I thought I saw it trigger on 1.10. Looks like this is kinda known already given that FLoops.jl is mentioned in this comment: https://github.com/JuliaLang/julia/pull/51853#issuecomment-1783538811

MasonProtter commented 9 months ago

Okay well #51853 has merged with passing tests, but my not-so-minimal working example still segfaults, so I'm reopening this.