JuliaLang / Distributed.jl

Create and control multiple Julia processes remotely for distributed computing. Ships as a Julia stdlib.
https://docs.julialang.org/en/v1/stdlib/Distributed/
MIT License
23 stars 9 forks source link

fetch(remotecall(...)) does not throw when nprocs() == 1 #63

Open tkf opened 4 years ago

tkf commented 4 years ago
julia> using Distributed

julia> fetch(remotecall(error, default_worker_pool(), "hello"))
RemoteException(1, CapturedException(ErrorException("hello"), Any[(error(::String) at error.jl:33, 1), ((::Distributed.var"#137#138"{typeof(error),Tuple{String},Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}})() at remotecall.jl:350, 1), (run_work_thunk(::Distributed.var"#137#138"{typeof(error),Tuple{String},Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}}, ::Bool) at process_messages.jl:79, 1), (run_work_thunk at process_messages.jl:88 [inlined], 1), ((::Distributed.var"#96#98"{Distributed.RemoteValue,Distributed.var"#137#138"{typeof(error),Tuple{String},Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}}})() at task.jl:333, 1)]))

julia> addprocs(1)
1-element Array{Int64,1}:
 2

julia> fetch(remotecall(error, default_worker_pool(), "hello"))
ERROR: On worker 2:
hello
error at ./error.jl:33
JuliaLang/julia#103 at /home/takafumi/repos/watch/julia/usr/share/julia/stdlib/v1.4/Distributed/src/process_messages.jl:290
run_work_thunk at /home/takafumi/repos/watch/julia/usr/share/julia/stdlib/v1.4/Distributed/src/process_messages.jl:79
run_work_thunk at /home/takafumi/repos/watch/julia/usr/share/julia/stdlib/v1.4/Distributed/src/process_messages.jl:88
JuliaLang/julia#96 at ./task.jl:333
Stacktrace:
 [1] #remotecall_fetch#143 at /home/takafumi/repos/watch/julia/usr/share/julia/stdlib/v1.4/Distributed/src/remotecall.jl:390 [inlined]
 [2] remotecall_fetch(::Function, ::Distributed.Worker, ::Distributed.RRID) at /home/takafumi/repos/watch/julia/usr/share/julia/stdlib/v1.4/Distributed/src/remotecall.jl:382
 [3] #remotecall_fetch#146 at /home/takafumi/repos/watch/julia/usr/share/julia/stdlib/v1.4/Distributed/src/remotecall.jl:417 [inlined]
 [4] remotecall_fetch at /home/takafumi/repos/watch/julia/usr/share/julia/stdlib/v1.4/Distributed/src/remotecall.jl:417 [inlined]
 [5] call_on_owner at /home/takafumi/repos/watch/julia/usr/share/julia/stdlib/v1.4/Distributed/src/remotecall.jl:490 [inlined]
 [6] fetch(::Future) at /home/takafumi/repos/watch/julia/usr/share/julia/stdlib/v1.4/Distributed/src/remotecall.jl:529
 [7] top-level scope at REPL[4]:1

julia> VERSION
v"1.4.0-DEV.297"

I expect fetch(remotecall(error, default_worker_pool(), "hello")) before and after addprocs(1) behaves similarly.

kleinschmidt commented 1 year ago

I think this is actually caused by the PID being the same between the fetching and error-ing worker, since if you manually use PID 1 it still does not throw even after addprocs:

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.2 (2022-09-29)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using Distributed

julia> fetch(remotecall(error, 1, "hello"))
RemoteException(1, CapturedException(ErrorException("hello"), Any[(error(s::String) at error.jl:35, 1), (#invokelatest#2 at essentials.jl:729 [inlined], 1), (invokelatest at essentials.jl:726 [inlined], 1), (#153 at remotecall.jl:425 [inlined], 1), (run_work_thunk(thunk::Distributed.var"#153#154"{typeof(error), Tuple{String}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}, print_error::Bool) at process_messages.jl:70, 1), (run_work_thunk(rv::Distributed.RemoteValue, thunk::Function) at process_messages.jl:79, 1), ((::Distributed.var"#100#102"{Distributed.RemoteValue, Distributed.var"#153#154"{typeof(error), Tuple{String}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}})() at task.jl:484, 1)]))

julia> addprocs(1)
1-element Vector{Int64}:
 2

julia> fetch(remotecall(error, 1, "hello"))
RemoteException(1, CapturedException(ErrorException("hello"), Any[(error(s::String) at error.jl:35, 1), (#invokelatest#2 at essentials.jl:729 [inlined], 1), (invokelatest at essentials.jl:726 [inlined], 1), (#153 at remotecall.jl:425 [inlined], 1), (run_work_thunk(thunk::Distributed.var"#153#154"{typeof(error), Tuple{String}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}, print_error::Bool) at process_messages.jl:70, 1), (run_work_thunk(rv::Distributed.RemoteValue, thunk::Function) at process_messages.jl:79, 1), ((::Distributed.var"#100#102"{Distributed.RemoteValue, Distributed.var"#153#154"{typeof(error), Tuple{String}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}})() at task.jl:484, 1)]))
vtjnash commented 1 year ago

Is https://github.com/JuliaLang/julia/issues/37935 just a duplicate of this then?

kleinschmidt commented 1 year ago

Yes it looks like it.