JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.43k stars 5.45k forks source link

`EOFError / ReadOnlyMemoryError` with local workers when importing package #28237

Open jtackm opened 6 years ago

jtackm commented 6 years ago

When starting julia with a larger number of local workers I occasionally run into errors when importing certain packages. Let's for example take this simplified package:

module TestPackage
using LightGraphs # error also happens with  'using DifferentialEquations'
end

When I start julia with julia -p 20 I occasionally (~every 5ths try) get:

julia> using TestPackage

 signal (11): Segmentation fault
while loading /home/janko/.julia/v0.6/TestPackage/src/TestPackage.jl, in expression starting on line 3
unknown function (ip: 0x7f9fe80f46f0)
unknown function (ip: 0x7fa0e80f4ce2)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1424 [inlined]
jl_f__apply at /buildworker/worker/package_linux64/build/src/builtins.c:426
jl_f__apply_latest at /buildworker/worker/package_linux64/build/src/builtins.c:464
message_handler_loop at ./distributed/process_messages.jl:161
process_tcp_streams at ./distributed/process_messages.jl:118
#99 at ./event.jl:73
unknown function (ip: 0x7fa0e80e759f)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1424 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:267
unknown function (ip: 0xffffffffffffffff)
Allocations: 1672748 (Pool: 1671483; Big: 1265); GC: 1
Worker 17 terminated.
ERROR (unhandled task failure): EOFError: read end of file
ERROR: ProcessExitedException()
try_yieldto(::Base.##296#297{Task}, ::Task) at ./event.jl:189
wait() at ./event.jl:234
wait(::Condition) at ./event.jl:27
wait_impl(::Channel{Any}) at ./channels.jl:364
wait(::Channel{Any}) at ./channels.jl:360
take_buffered at ./channels.jl:319 [inlined]
take!(::Channel{Any}) at ./channels.jl:317
#remotecall_fetch#141(::Array{Any,1}, ::Function, ::Function, ::Base.Distributed.Worker) at ./distributed/remotecall.jl:350
remotecall_fetch(::Function, ::Base.Distributed.Worker) at ./distributed/remotecall.jl:346
#remotecall_fetch#144(::Array{Any,1}, ::Function, ::Function, ::Int64) at ./distributed/remotecall.jl:367
remotecall_fetch(::Function, ::Int64) at ./distributed/remotecall.jl:367
(::Base.##700#703)() at ./task.jl:335
Stacktrace:
 [1] sync_end() at ./task.jl:287
 [2] macro expansion at ./task.jl:303 [inlined]
 [3] _require(::Symbol) at ./loading.jl:482
 [4] require(::Symbol) at ./loading.jl:405

Or sometimes a ReadOnlyMemoryError:

ERROR: On worker 20:
LoadError: On worker 20:
ReadOnlyMemoryError()
#remotecall_fetch#141 at ./distributed/remotecall.jl:354
remotecall_fetch at ./distributed/remotecall.jl:346
#remotecall_fetch#144 at ./distributed/remotecall.jl:367
find_in_node_path at ./loading.jl:127
_require at ./loading.jl:433
require at ./loading.jl:405
_require_from_serialized at ./loading.jl:203
_require_search_from_serialized at ./loading.jl:236
_require at ./loading.jl:441
require at ./loading.jl:405
include_string at ./loading.jl:522
include_from_node1 at ./loading.jl:579
eval at ./boot.jl:235
#701 at ./loading.jl:485
#106 at ./distributed/process_messages.jl:268 [inlined]
run_work_thunk at ./distributed/process_messages.jl:56
macro expansion at ./distributed/process_messages.jl:268 [inlined]
#105 at ./event.jl:73
while loading /home/janko/.julia/v0.6/TestPackage/src/TestPackage.jl, in expression starting on line 3
#remotecall_fetch#141(::Array{Any,1}, ::Function, ::Function, ::Base.Distributed.Worker) at ./distributed/remotecall.jl:354
remotecall_fetch(::Function, ::Base.Distributed.Worker) at ./distributed/remotecall.jl:346
#remotecall_fetch#144(::Array{Any,1}, ::Function, ::Function, ::Int64) at ./distributed/remotecall.jl:367
remotecall_fetch(::Function, ::Int64) at ./distributed/remotecall.jl:367
(::Base.##700#703)() at ./task.jl:335
Stacktrace:
 [1] sync_end() at ./task.jl:287
 [2] macro expansion at ./task.jl:303 [inlined]
 [3] _require(::Symbol) at ./loading.jl:482
 [4] require(::Symbol) at ./loading.jl:405
 [5] macro expansion at ./distributed/macros.jl:99 [inlined]
 [6] anonymous at ./<missing>:?

The same happens at a much reduced frequency when adding less workers (some occurrences at -p 10, none AFAIC for -p 5) or when using @everywhere using TestPackage. The errors are also reproducible with:

julia> using LightGraphs # or 'using DifferentialEquations'

but don't seem to appear with

julia> @everywhere using LightGraphs # or '@everywhere using DifferentialEquations'

Version details:

julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E7- 4870  @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Nehalem)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, westmere)

julia> [Pkg.status(mdl) for mdl in ("LightGraphs", "DifferentialEquations")];
 - LightGraphs                   0.12.0
 - DifferentialEquations         4.5.0
sbromberger commented 6 years ago

ref: https://github.com/JuliaGraphs/LightGraphs.jl/issues/937

jtackm commented 6 years ago

It seems the problem is fixed on the latest 0.7.0 beta.

Version details:


julia> versioninfo()
Julia Version 0.7.0-beta2.0
Commit b145832402* (2018-07-13 19:54 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E7- 4870  @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, westmere)
Environment:
...

(v0.7) pkg> status
    Status `~/.julia/environments/v0.7/Project.toml`
  [0c46a032] DifferentialEquations v5.1.0
  [093fc24a] LightGraphs v0.13.1+ #master (https://github.com/JuliaGraphs/LightGraphs.jl.git)
...