JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.65k stars 5.48k forks source link

Sudden non-deterministic segfault error when using multithreading #39278

Closed vlandau closed 3 years ago

vlandau commented 3 years ago

I've encountered a seemingly non-deterministic bug when running tests for Omniscape.jl on GitHub Actions. It is triggered during a test for a function that uses multithreading. The serial implementation of the same function passes reliably. The only difference between the two versions of the function is that one uses a for loop, and the other uses a @threads for loop. Following no additional changes to source code, CI suddenly started failing with this output:

signal (11): Segmentation fault
in expression starting at /home/runner/work/Omniscape.jl/Omniscape.jl/test/runtests.jl:91
_ZN4llvm9DWARFUnit9getParentEPKNS_19DWARFDebugInfoEntryE at /opt/hostedtoolcache/julia/1.5.3/x64/bin/../lib/julia/libLLVM-9jl.so (unknown line)
_ZN4llvm9DWARFUnit25getInlinedChainForAddressEmRNS_15SmallVectorImplINS_8DWARFDieEEE at /opt/hostedtoolcache/julia/1.5.3/x64/bin/../lib/julia/libLLVM-9jl.so (unknown line)
_ZN4llvm12DWARFContext25getInliningInfoForAddressENS_6object16SectionedAddressENS_19DILineInfoSpecifierE at /opt/hostedtoolcache/julia/1.5.3/x64/bin/../lib/julia/libLLVM-9jl.so (unknown line)
lookup_pointer at /buildworker/worker/package_linux64/build/src/debuginfo.cpp:547
jl_getDylibFunctionInfo at /buildworker/worker/package_linux64/build/src/debuginfo.cpp:1219 [inlined]
jl_getFunctionInfo at /buildworker/worker/package_linux64/build/src/debuginfo.cpp:1264
lookup_pointer at /buildworker/worker/package_linux64/build/src/debuginfo.cpp:547
jl_lookup_code_address at /buildworker/worker/package_linux64/build/src/stackwalk.c:572
jl_getDylibFunctionInfo at /buildworker/worker/package_linux64/build/src/debuginfo.cpp:1219 [inlined]
jl_getFunctionInfo at /buildworker/worker/package_linux64/build/src/debuginfo.cpp:1264
jl_lookup_code_address at /buildworker/worker/package_linux64/build/src/stackwalk.c:572
lookup at ./stacktraces.jl:107
firstcaller at ./deprecated.jl:110
firstcaller at ./deprecated.jl:105 [inlined]
macro expansion at ./deprecated.jl:90 [inlined]
macro expansion at ./logging.jl:321 [inlined]
#depwarn#797 at ./deprecated.jl:85
depwarn at ./deprecated.jl:80 [inlined]
#cg!#23 at /home/runner/.julia/packages/IterativeSolvers/upIVv/src/cg.jl:230
cg!##kw at /home/runner/.julia/packages/IterativeSolvers/upIVv/src/cg.jl:223
#cg#22 at /home/runner/.julia/packages/IterativeSolvers/upIVv/src/cg.jl:169
cg##kw at /home/runner/.julia/packages/IterativeSolvers/upIVv/src/cg.jl:169 [inlined]
solve_linear_system at /home/runner/.julia/packages/Circuitscape/9x9VD/src/core.jl:577
macro expansion at ./timing.jl:233 [inlined]
multiple_solver at /home/runner/.julia/packages/Circuitscape/9x9VD/src/raster/advanced.jl:284
calculate_current at /home/runner/work/Omniscape.jl/Omniscape.jl/src/utils.jl:410
unknown function (ip: 0x7fbd34179239)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
solve_target! at /home/runner/work/Omniscape.jl/Omniscape.jl/src/utils.jl:496
unknown function (ip: 0x7fbd0824e439)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
macro expansion at /home/runner/work/Omniscape.jl/Omniscape.jl/src/main.jl:273 [inlined]
#71#threadsfor_fun at ./threadingconstructs.jl:81
#71#threadsfor_fun at ./threadingconstructs.jl:48
unknown function (ip: 0x7fbd0826f6fc)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2231 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
ERROR: Package Omniscape errored during testing (received signal: 11)
Stacktrace:
 [1] pkgerror(::String) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Pkg/src/Types.jl:52
 [2] test(::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}; coverage::Bool, julia_args::Cmd, test_args::Cmd, test_fn::Nothing) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Pkg/src/Operations.jl:1578
 [3] test(::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}; coverage::Bool, test_fn::Nothing, julia_args::Cmd, test_args::Cmd, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Pkg/src/API.jl:327
 [4] #test#61 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Pkg/src/API.jl:67 [inlined]
 [5] test(; name::Nothing, uuid::Nothing, version::Nothing, url::Nothing, rev::Nothing, path::Nothing, mode::Pkg.Types.PackageMode, subdir::Nothing, kwargs::Base.Iterators.Pairs{Symbol,Bool,Tuple{Symbol},NamedTuple{(:coverage,),Tuple{Bool}}}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Pkg/src/API.jl:80
 [6] top-level scope at none:1

At first, the error only occured on Ubuntu, and passed on MacOS, then when rerunning tests (again with no changes to source code) it failed on MacOS but passed on Ubuntu. Now it is failing on both. I'm unable to reproduce locally, but I was able to SSH into the Github Actions runner. The tests seem to fail reliably if tests have already been run at least once. When running package tests from the SSH session, I get this error, though: (but I guess a ReadOnlyMemoryError is a type of segfault?)

Got exception outside of a @test
  TaskFailedException:
  ReadOnlyMemoryError()
  Stacktrace:
   [1] lookup(::Ptr{Nothing}) at ./stacktraces.jl:107
   [2] firstcaller(::Array{Union{Ptr{Nothing}, Base.InterpreterIP},1}, ::Tuple{Symbol}) at ./deprecated.jl:110
   [3] firstcaller at ./deprecated.jl:105 [inlined]
   [4] macro expansion at ./deprecated.jl:90 [inlined]
   [5] macro expansion at ./logging.jl:321 [inlined]
   [6] depwarn(::String, ::Symbol; force::Bool) at ./deprecated.jl:85
   [7] depwarn at ./deprecated.jl:80 [inlined]
   [8] cg!(::Array{Float64,1}, ::SparseArrays.SparseMatrixCSC{Float64,Int64}, ::Array{Float64,1}; abstol::Float64, reltol::Float64, tol::Float64, maxiter::Int64, log::Bool, statevars::IterativeSolvers.CGStateVariables{Float64,Array{Float64,1}}, verbose::Bool, Pl::AlgebraicMultigrid.Preconditioner{AlgebraicMultigrid.MultiLevel{AlgebraicMultigrid.Pinv{Float64},AlgebraicMultigrid.GaussSeidel{AlgebraicMultigrid.SymmetricSweep},AlgebraicMultigrid.GaussSeidel{AlgebraicMultigrid.SymmetricSweep},SparseArrays.SparseMatrixCSC{Float64,Int64},SparseArrays.SparseMatrixCSC{Float64,Int64},LinearAlgebra.Adjoint{Float64,SparseArrays.SparseMatrixCSC{Float64,Int64}},AlgebraicMultigrid.MultiLevelWorkspace{Array{Float64,1},1}}}, kwargs::Base.Iterators.Pairs{Symbol,Bool,Tuple{Symbol},NamedTuple{(:initially_zero,),Tuple{Bool}}}) at /home/runner/.julia/packages/IterativeSolvers/upIVv/src/cg.jl:230
   [9] cg(::SparseArrays.SparseMatrixCSC{Float64,Int64}, ::Array{Float64,1}; kwargs::Base.Iterators.Pairs{Symbol,Any,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:Pl, :tol, :maxiter),Tuple{AlgebraicMultigrid.Preconditioner{AlgebraicMultigrid.MultiLevel{AlgebraicMultigrid.Pinv{Float64},AlgebraicMultigrid.GaussSeidel{AlgebraicMultigrid.SymmetricSweep},AlgebraicMultigrid.GaussSeidel{AlgebraicMultigrid.SymmetricSweep},SparseArrays.SparseMatrixCSC{Float64,Int64},SparseArrays.SparseMatrixCSC{Float64,Int64},LinearAlgebra.Adjoint{Float64,SparseArrays.SparseMatrixCSC{Float64,Int64}},AlgebraicMultigrid.MultiLevelWorkspace{Array{Float64,1},1}}},Float64,Int64}}}) at /home/runner/.julia/packages/IterativeSolvers/upIVv/src/cg.jl:169
   [10] solve_linear_system(::Dict{String,String}, ::SparseArrays.SparseMatrixCSC{Float64,Int64}, ::Array{Float64,1}, ::AlgebraicMultigrid.Preconditioner{AlgebraicMultigrid.MultiLevel{AlgebraicMultigrid.Pinv{Float64},AlgebraicMultigrid.GaussSeidel{AlgebraicMultigrid.SymmetricSweep},AlgebraicMultigrid.GaussSeidel{AlgebraicMultigrid.SymmetricSweep},SparseArrays.SparseMatrixCSC{Float64,Int64},SparseArrays.SparseMatrixCSC{Float64,Int64},LinearAlgebra.Adjoint{Float64,SparseArrays.SparseMatrixCSC{Float64,Int64}},AlgebraicMultigrid.MultiLevelWorkspace{Array{Float64,1},1}}}) at /home/runner/.julia/packages/Circuitscape/9x9VD/src/core.jl:577
   [11] macro expansion at ./timing.jl:233 [inlined]
   [12] multiple_solver(::Dict{String,String}, ::SparseArrays.SparseMatrixCSC{Float64,Int64}, ::Array{Float64,1}, ::Array{Float64,1}, ::Array{Float64,1}) at /home/runner/.julia/packages/Circuitscape/9x9VD/src/raster/advanced.jl:284
   [13] calculate_current(::Array{Union{Missing, Float64},2}, ::Array{Union{Missing, Float64},2}, ::Array{Float64,2}, ::Circuitscape.RasterFlags, ::Dict{String,String}, ::DataType) at /home/runner/work/Omniscape.jl/Omniscape.jl/src/utils.jl:410
   [14] solve_target!(::Int64, ::Int64, ::Dict{String,Int64}, ::Array{Float64,2}, ::Array{Union{Missing, Float64},2}, ::Array{Union{Missing, Float64},2}, ::Omniscape.OmniscapeFlags, ::Dict{String,String}, ::Circuitscape.RasterFlags, ::Circuitscape.OutputFlags, ::Array{Union{Missing, Float64},2}, ::Array{Union{Missing, Float64},2}, ::Array{Union{Missing, Float64},2}, ::Array{Union{Missing, Float64},2}, ::String, ::String, ::Float64, ::Float64, ::Float64, ::Float64, ::Array{Float64,2}, ::Array{Float64,3}, ::Array{Float64,3}, ::DataType) at /home/runner/work/Omniscape.jl/Omniscape.jl/src/utils.jl:496
   [15] macro expansion at /home/runner/work/Omniscape.jl/Omniscape.jl/src/main.jl:273 [inlined]
   [16] (::Omniscape.var"#71#threadsfor_fun#15"{Dict{String,Int64},DataType,Omniscape.OmniscapeFlags,String,String,Float64,Float64,Float64,Float64,Dict{String,String},Int64,Circuitscape.OutputFlags,Circuitscape.RasterFlags,ProgressMeter.Progress,Int64,UnitRange{Int64}})(::Bool) at ./threadingconstructs.jl:81
   [17] (::Omniscape.var"#71#threadsfor_fun#15"{Dict{String,Int64},DataType,Omniscape.OmniscapeFlags,String,String,Float64,Float64,Float64,Float64,Dict{String,String},Int64,Circuitscape.OutputFlags,Circuitscape.RasterFlags,ProgressMeter.Progress,Int64,UnitRange{Int64}})() at ./threadingconstructs.jl:48
  Stacktrace:
   [1] wait at ./task.jl:267 [inlined]
   [2] threading_run(::Function) at ./threadingconstructs.jl:34
   [3] macro expansion at ./threadingconstructs.jl:93 [inlined]
   [4] run_omniscape(::Dict{String,String}, ::Array{Union{Missing, Float64},2}; reclass_table::Array{Union{Missing, Float64},2}, source_strength::Array{Union{Missing, Float64},2}, condition1::Array{Union{Missing, Float64},2}, condition2::Array{Union{Missing, Float64},2}, condition1_future::Array{Union{Missing, Float64},2}, condition2_future::Array{Union{Missing, Float64},2}, wkt::String, geotransform::Array{Float64,1}, write_outputs::Bool) at /home/runner/work/Omniscape.jl/Omniscape.jl/src/main.jl:268
   [5] run_omniscape(::String) at /home/runner/work/Omniscape.jl/Omniscape.jl/src/main.jl:561
   [6] top-level scope at /home/runner/work/Omniscape.jl/Omniscape.jl/test/runtests.jl:101
   [7] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
   [8] top-level scope at /home/runner/work/Omniscape.jl/Omniscape.jl/test/runtests.jl:93
   [9] include(::String) at ./client.jl:457
   [10] top-level scope at none:6
   [11] eval(::Module, ::Any) at ./boot.jl:331
   [12] exec_options(::Base.JLOptions) at ./client.jl:272
   [13] _start() at ./client.jl:506

I ran the test with --bug-report=rr to upload a bug report, but because the error seems non-deterministic, I can't be positive that the error was triggered because no output was shown in the terminal when running tests with bug reporting enabled. The error happened, though, in tests just before and just after running with --bug-report=rr, so I'm hoping the bug report contains the pertinent information.

Here is a link to the bug report.

JeffBezanson commented 3 years ago

Thanks for the report. What version of julia? I think there is a chance this is fixed on master or release-1.6.

vlandau commented 3 years ago

This was on Julia 1.5.3 -- no problems on 1.6.0-beta1! I figured there might not be another patch release for 1.5, but I wanted to post the issue here just in case there will be.

JeffBezanson commented 3 years ago

Wonderful, thanks for checking!