Open DatName opened 2 years ago
Unfortunately this will not be debuggable without reproducer or rr trace.
I see. When I run it with
export JULIA_NUM_THREADS=12
./julia --bug-report=rr-local
the program just stalls on a non-blocking call:
julia> start!(ctx)
[ Info: Listening on: 0.0.0.0:26000
^CERROR: InterruptException:
Stacktrace:
[1] poptask(W::Base.InvasiveLinkedListSynchronized{Task})
@ Base ./task.jl:827
[2] wait()
@ Base ./task.jl:836
[3] wait(c::Base.GenericCondition{Base.Threads.SpinLock})
@ Base ./condition.jl:123
[4] wait(x::Base.Process)
@ Base ./process.jl:627
[5] success
@ ./process.jl:489 [inlined]
[6] run(::Cmd; wait::Bool)
@ Base ./process.jl:446
[7] run
@ ./process.jl:444 [inlined]
[8] (::BugReporting.var"#7#8"{Nothing, Tuple{Cmd, Vector{String}}})(rr_path::String)
@ BugReporting ~/.julia/packages/BugReporting/7auqP/src/BugReporting.jl:132
[9] (::JLLWrappers.var"#2#3"{BugReporting.var"#7#8"{Nothing, Tuple{Cmd, Vector{String}}}, String})()
@ JLLWrappers ~/.julia/packages/JLLWrappers/bkwIo/src/runtime.jl:49
[10] withenv(::JLLWrappers.var"#2#3"{BugReporting.var"#7#8"{Nothing, Tuple{Cmd, Vector{String}}}, String}, ::Pair{String, String}, ::Vararg{Pair{String, String}})
@ Base ./env.jl:172
[11] withenv_executable_wrapper(f::Function, executable_path::String, PATH::String, LIBPATH::String, adjust_PATH::Bool, adjust_LIBPATH::Bool)
@ JLLWrappers ~/.julia/packages/JLLWrappers/bkwIo/src/runtime.jl:48
[12] #invokelatest#2
@ ./essentials.jl:716 [inlined]
[13] invokelatest
@ ./essentials.jl:714 [inlined]
[14] #rr#7
@ ~/.julia/packages/JLLWrappers/bkwIo/src/products/executable_generators.jl:7 [inlined]
[15] rr
@ ~/.julia/packages/JLLWrappers/bkwIo/src/products/executable_generators.jl:7 [inlined]
[16] #rr_record#6
@ ~/.julia/packages/BugReporting/7auqP/src/BugReporting.jl:122 [inlined]
[17] rr_record
@ ~/.julia/packages/BugReporting/7auqP/src/BugReporting.jl:119 [inlined]
[18] make_interactive_report(report_type::String, ARGS::Vector{String})
@ BugReporting ~/.julia/packages/BugReporting/7auqP/src/BugReporting.jl:208
[19] #invokelatest#2
@ ./essentials.jl:716 [inlined]
[20] invokelatest
@ ./essentials.jl:714 [inlined]
[21] report_bug(kind::String)
@ InteractiveUtils ~/code/julia/julia-1.7.1/share/julia/stdlib/v1.7/InteractiveUtils/src/InteractiveUtils.jl:397
[22] exec_options(opts::Base.JLOptions)
@ Base ./client.jl:233
[23] _start()
@ Base ./client.jl:495
Could this be by any chance related?
Could this be by any chance related?
Perhaps, but the backtrace is of the outside process not where it's actually blocked. Also rr can make things slow, so you may just need to let it run for a while.
You can also try running with --check-bounds=yes
.
I've had a similar problem with 1.7.2. Downgraded to 1.6.6 LTS and it resolved so does appear to be Julia version specific.
I talked with Aaron out-of-band, and here are some more details on the code he ran:
He has a function gwas_extract_snps
defined as such:
function gwas_extract_snps(gwas_fh,gwas_keep_fh,keep_snp_set,delim)
# extract keep_snp_set of snps from a gwas file
gwas_io = GZip.open(gwas_fh)
gwas_keep_io = open(gwas_keep_fh,"w")
i = 1
for line in eachline(gwas_io)
snp = split(line,delim)[2]
if in(snp,keep_snp_set)
write(gwas_keep_io,line*"\n")
end
i += 1
if (i % 1000000) == 0
#println(i)
end
end
close(gwas_io)
close(gwas_keep_io)
end
And then he has a Distributed for loop of the form:
Distributed.@distributed vcat for met in met_arr_keep
#download file from google bucket
#run gwas_extract_snps()
#delete original file
end
This table shows whether or not he gets the segfault. ✅ means no segfault. ❌ means he encountered the segfault.
Julia version | @distributed |
-p |
Result | Notes |
---|---|---|---|---|
1.6.6 | yes | 2 | :white_check_mark: | Command-line |
1.7.2 | yes | 2 | :x: | Command-line |
1.7.2 | no | 1 | :white_check_mark: | REPL |
His data cannot be shared publicly, unfortunately, so we don't have an MWE.
I have a relatively big multithreaded application which runs fine on 1.6.4, but segfaults on 1.7 and 1.7.1. I will try to create a minimal example which reproduces this segfault, but for now I have console log only: