JuliaLang / Distributed.jl

Create and control multiple Julia processes remotely for distributed computing. Ships as a Julia stdlib.
https://docs.julialang.org/en/v1/stdlib/Distributed/
MIT License
29 stars 11 forks source link

Distributed docs are a bit contradictory as to if it loads packages you are `using` on the manager process #64

Open oxinabox opened 4 years ago

oxinabox commented 4 years ago

The answer is: Yes it loads them (requires them, i.e. don't bring into scope), on all workers, when you using them on the manager, but when you start a new worker, it doesn't start with the ones that are currently loaded on the manager process.

https://docs.julialang.org/en/v1/manual/parallel-computing/#code-availability-1 first says:

Finally, if DummyModule.jl is not a standalone file but a package, then using DummyModule will load DummyModule.jl on all processes, but only bring it into scope on the process where using was called.

Then later kind of contradicts that:

Note that workers do not run a ~/.julia/config/startup.jl startup script, nor do they synchronize their global state (such as global variables, new method definitions, and loaded modules) with any of the other running processes.

The second bit is kind wrong. It does synconize loaded modules.

vchuravy commented 4 years ago
➜  ~ julia -p 2
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.2.0 (2019-08-20)
 _/ |\__'_|_|_|\__'_|  |
|__/                   |

julia> using SIMD

julia> remotecall_fetch(()->Main.SIMD.Vec{2, Float64}((0, 0)), 2)
<2 x Float64>[0.0, 0.0]

julia> addprocs(1)
1-element Array{Int64,1}:
 4

julia> remotecall_fetch(()->Main.SIMD.Vec{2, Float64}((0, 0)), 4)
ERROR: On worker 4:
KeyError: key SIMD [fdea26ae-647d-5447-a871-4b548cad5224] not found
deserialize_global_from_main at /build/julia/src/julia-1.2.0/usr/share/julia/stdlib/v1.2/Serialization/src/Serialization.jl:722
JuliaLang/julia#5 at /build/julia/src/julia-1.2.0/usr/share/julia/stdlib/v1.2/Distributed/src/clusterserialize.jl:72 [inlined]
foreach at ./abstractarray.jl:1920
deserialize at /build/julia/src/julia-1.2.0/usr/share/julia/stdlib/v1.2/Distributed/src/clusterserialize.jl:72
JuliaLang/julia#105 at ./task.jl:268
Stacktrace:
 [1] #remotecall_fetch#149 at /build/julia/src/julia-1.2.0/usr/share/julia/stdlib/v1.2/Distributed/src/remotecall.jl:379 [inlined]
 [2] remotecall_fetch(::Function, ::Distributed.Worker) at /build/julia/src/julia-1.2.0/usr/share/julia/stdlib/v1.2/Distributed/src/remotecall.jl:371
 [3] #remotecall_fetch#152(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(remotecall_fetch), ::Function, ::Int64) at /build/julia/src/julia-1.2.0/usr/share/julia/stdlib/v1.2/Distributed/src/remotecall.jl:406
 [4] remotecall_fetch(::Function, ::Int64) at /build/julia/src/julia-1.2.0/usr/share/julia/stdlib/v1.2/Distributed/src/remotecall.jl:406
 [5] top-level scope at REPL[4]:1

No processes added afterwards do not synchronize loaded modules.

oxinabox commented 4 years ago

No processes added afterwards do not synchronize loaded modules.

Indeed, but ones added before do.

Maybe we should include that example in the docs. (Probably witha a add_procs at that start rather than -p, for shortness.0 That would clear things up.