Open tpapp opened 7 years ago
Issue is with Julia's pmap
, we are waiting on some updates to it: https://github.com/JuliaLang/julia/issues/19000
What exactly is the issue? While the keyword arguments to pmap
have changed, the behavior has not.
I think @brian-j-smith will be better able to answer this question, but if I recall the issue was that not all anonymous functions were being sent out to the workers.
AFAIK the issue is with capturing globals in the anonymous function. Which existed prior to 0.5 too. A workaround for that issue is to wrap the closure creation in a let
block.
julia> a=1
1
julia> remotecall_fetch(()->a, 2)
ERROR: On worker 2:
UndefVarError: a not defined
in #1 at ./REPL[3]:1
.....
julia> let a=a; remotecall_fetch(()->a, 2); end
1
The pmap errors we are getting now in 0.5 do not occur in 0.4. Below is the 0.5 error, for what its worth. When I get some free time, I'll try to come up with a stand-alone test case that reproduces the error.
ERROR: LoadError: On worker 2:
UndefVarError: ##298#299 not defined
in deserialize_datatype at .\serialize.jl:823
in handle_deserialize at .\serialize.jl:571
in deserialize at .\serialize.jl:881
in deserialize_datatype at .\serialize.jl:835
in handle_deserialize at .\serialize.jl:571
in deserialize at .\serialize.jl:894
in deserialize_datatype at .\serialize.jl:835
in handle_deserialize at .\serialize.jl:571
in deserialize at .\serialize.jl:881
in deserialize_datatype at .\serialize.jl:835
in handle_deserialize at .\serialize.jl:571
in deserialize_array at .\serialize.jl:709
in handle_deserialize at .\serialize.jl:569
in deserialize at .\serialize.jl:541
in ntuple at .\tuple.jl:65
in handle_deserialize at .\serialize.jl:562
in deserialize_msg at .\multi.jl:120
in message_handler_loop at .\multi.jl:1317
in process_tcp_streams at .\multi.jl:1276
in #620 at .\event.jl:68
in #remotecall_fetch#608(::Array{Any,1}, ::Function, ::Function, ::Base.Worker,
::Array{Any,1}, ::Vararg{Array{Any,1},N}) at .\multi.jl:1070
in remotecall_fetch(::Function, ::Base.Worker, ::Array{Any,1}, ::Vararg{Array{A
ny,1},N}) at .\multi.jl:1062
in #remotecall_fetch#611(::Array{Any,1}, ::Function, ::Function, ::Int64, ::Arr
ay{Any,1}, ::Vararg{Array{Any,1},N}) at .\multi.jl:1080
in remotecall_fetch(::Function, ::Int64, ::Array{Any,1}, ::Vararg{Array{Any,1},
N}) at .\multi.jl:1080
in #remotecall_pool#691(::Array{Any,1}, ::Function, ::Function, ::Function, ::W
orkerPool, ::Array{Any,1}, ::Vararg{Array{Any,1},N}) at .\workerpool.jl:93
in remotecall_pool(::Function, ::Function, ::WorkerPool, ::Array{Any,1}, ::Vara
rg{Array{Any,1},N}) at .\workerpool.jl:91
in #remotecall_fetch#694(::Array{Any,1}, ::Function, ::Function, ::WorkerPool,
::Array{Any,1}, ::Vararg{Array{Any,1},N}) at .\workerpool.jl:124
in remotecall_fetch(::Function, ::WorkerPool, ::Array{Any,1}, ::Vararg{Array{An
y,1},N}) at .\workerpool.jl:124
in (::Base.###699#700#702{WorkerPool,Mamba.#mcmc_worker!})(::Array{Any,1}, ::Fu
nction, ::Array{Any,1}, ::Vararg{Array{Any,1},N}) at .\workerpool.jl:151
in (::Base.##699#701)(::Array{Any,1}, ::Vararg{Array{Any,1},N}) at .\workerpool
.jl:151
in macro expansion at .\asyncmap.jl:63 [inlined]
in (::Base.##757#759{Base.AsyncCollector,Base.AsyncCollectorState})() at .\task
.jl:360
This is the reduced case:
julia> c = x->x
(anonymous function)
julia> d = x->c(x)
(anonymous function)
On 0.4:
julia> pmap(d, 1:2)
2-element Array{Any,1}:
1
2
On 0.5:
julia> pmap(d, 1:2)
ERROR: On worker 2:
UndefVarError: c not defined
in #3 at ./REPL[2]:1
Great! Thanks for coming up with a reduced case. I'll keep an eye on your other thread and possible developments in julia to address this. A work-around on our package side may be challenging since some anonymous functions are user-supplied.
@amitmurthy still having pmap issues in v0.6
julia> include("../deonovib\\.julia/v0.6\\Mamba\\doc/examples/rats.jl")
MCMC Simulation of 10000 Iterations x 2 Chains...
ERROR: LoadError: On worker 2:
UndefVarError: ##332#333 not defined
deserialize_datatype at .\serialize.jl:969
handle_deserialize at .\serialize.jl:674
deserialize at .\serialize.jl:634
handle_deserialize at .\serialize.jl:681
deserialize at .\serialize.jl:1075
handle_deserialize at .\serialize.jl:687
deserialize at .\serialize.jl:1088
handle_deserialize at .\serialize.jl:687
deserialize at .\serialize.jl:1075
handle_deserialize at .\serialize.jl:687
deserialize_array at .\serialize.jl:871
handle_deserialize at .\serialize.jl:672
deserialize at .\serialize.jl:634
ntuple at .\tuple.jl:108
handle_deserialize at .\serialize.jl:664
deserialize_msg at .\distributed\messages.jl:98
message_handler_loop at .\distributed\process_messages.jl:161
process_tcp_streams at .\distributed\process_messages.jl:118
#99 at .\event.jl:73
Stacktrace:
[1] #570 at .\asyncmap.jl:178 [inlined]
[2] foreach(::Base.##570#572, ::Array{Any,1}) at .\abstractarray.jl:1731
[3] maptwice(::Function, ::Channel{Any}, ::Array{Any,1}, ::Array{Array{Any,1},1}, ::Vararg{Array{Array{Any,1},1},N} where N) at .\asyncmap.jl:178
[4] wrap_n_exec_twice(::Channel{Any}, ::Array{Any,1}, ::Base.Distributed.##204#207{WorkerPool}, ::Function, ::Array{Array{Any,1},1}, ::Vararg{Array{Array{Any,1},1},N} where N) at .\asyncmap.jl:154
[5] #async_usemap#555(::Function, ::Void, ::Function, ::Base.Distributed.##188#190, ::Array{Array{Any,1},1}, ::Vararg{Array{Array{Any,1},1},N} where N) at .\asyncmap.jl:103
[6] (::Base.#kw##async_usemap)(::Array{Any,1}, ::Base.#async_usemap, ::Function, ::Array{Array{Any,1},1}, ::Vararg{Array{Array{Any,1},1},N} where N) at .\<missing>:0
[7] (::Base.#kw##asyncmap)(::Array{Any,1}, ::Base.#asyncmap, ::Function, ::Array{Array{Any,1},1}) at .\<missing>:0
[8] #pmap#203(::Bool, ::Int64, ::Void, ::Array{Any,1}, ::Void, ::Function, ::WorkerPool, ::Function, ::Array{Array{Any,1},1}) at .\distributed\pmap.jl:126
[9] pmap(::WorkerPool, ::Function, ::Array{Array{Any,1},1}) at .\distributed\pmap.jl:101
[10] #pmap#213(::Array{Any,1}, ::Function, ::Function, ::Array{Array{Any,1},1}) at .\distributed\pmap.jl:156
[11] pmap2(::Function, ::Array{Array{Any,1},1}) at C:\Users\deonovib\.julia\v0.6\Mamba\src\utils.jl:97
[12] mcmc_master!(::Mamba.Model, ::UnitRange{Int64}, ::Int64, ::Int64, ::UnitRange{Int64}, ::Bool) at C:\Users\deonovib\.julia\v0.6\Mamba\src\model\mcmc.jl:52
[13] #mcmc#29(::Int64, ::Int64, ::Int64, ::Bool, ::Function, ::Mamba.Model, ::Dict{Symbol,Any}, ::Array{Dict{Symbol,Any},1}, ::Int64) at C:\Users\deonovib\.julia\v0.6\Mamba\src\model\mcmc.jl:32
[14] (::Mamba.#kw##mcmc)(::Array{Any,1}, ::Mamba.#mcmc, ::Mamba.Model, ::Dict{Symbol,Any}, ::Array{Dict{Symbol,Any},1}, ::Int64) at .\<missing>:0
[15] include_from_node1(::String) at .\loading.jl:569
[16] include(::String) at .\sysimg.jl:14
while loading C:\Users\deonovib\.julia\v0.6\Mamba\doc\examples\rats.jl, in expression starting on line 121
Does your code serialize the same anonymous function across multiple workers? i.e., it is initially defined on wrkr_1 which sends it to wrkr_2 which sends it to wrkr_3 and so on (see code example https://github.com/JuliaLang/julia/pull/22675#issuecomment-313000815). If so, this should be fixed by https://github.com/JuliaLang/julia/pull/22675 which is backport pending onto 0.6.
I don't think that is the issue. The relevant code seems like a straight forward use of pmap
lsts = [
Any[m, states[k], window, burnin, thin, ChainProgress(frame, k, N)]
for k in chains
]
results = pmap(mcmc_worker!, lsts)
where m
is a Type that contains anonymous functions. I think m
is the real issue. it is:
type Model
nodes::Dict{Symbol, Any}
samplers::Vector{Sampler}
states::Vector{ModelState}
iter::Int
burnin::Int
hasinputs::Bool
hasinits::Bool
end
and Sampler
is
type Sampler{T}
params::Vector{Symbol}
eval::Function
tune::T
targets::Vector{Symbol}
end
where a Sampler might look something like:
function RWM{T<:Real}(params::ElementOrVector{Symbol},
scale::ElementOrVector{T}; args...)
samplerfx = function(model::Model, block::Integer)
block = SamplingBlock(model, block, true)
v = SamplerVariate(block, scale; args...)
sample!(v, x -> logpdf!(block, x))
relist(block, v)
end
Sampler(params, samplerfx, RWMTune())
end
so I think we are having trouble passing m
to workers because it has quite a heavy nesting of types, named functions, and anonymous functions.
Is there a sample code that I can test with locally? I assume all dependent packages are publicly available?
Currently we have it hacked such that only sequential processing is possible. Simply Pkg.add("Mamba")
and then modify the file .julia/v0.6/Mamba/src/utils.jl
by adding comment before Version hack.
...
if (nprocs() > 1) #& (VERSION < v"0.5-")
...
Run any of the examples that utilize multiple chains you will get the error.
e.g.
include(".julia/v0.6/Mamba/doc/examples/rats.jl")
or after doing the include you can also replicate error with
remotecall_fetch(fieldnames, 2, model)
which is a little more direct and shows that the issue is sending model out to the different workers.
On OSX I had trouble installing Mamba
(Julia 0.6) . Gadfly related errors - does not work even with a checkout of Gadfly master. Will try again after a few days.
What was the error? Seems strange since Travis shows it installing and all tests pass on OSX https://travis-ci.org/brian-j-smith/Mamba.jl
Fails with
WARNING: could not import Multimedia.@try_display into Gadfly
ERROR: LoadError: UndefVarError: @try_display not defined
Stacktrace:
[1] include_from_node1(::String) at ./loading.jl:569
[2] include(::String) at ./sysimg.jl:14
[3] anonymous at ./<missing>:2
while loading /Users/amitm/.julia/v0.6/Gadfly/src/Gadfly.jl, in expression starting on line 1052
ERROR: LoadError: Failed to precompile Gadfly to /Users/amitm/.julia/lib/v0.6/Gadfly.ji.
Stacktrace:
[1] compilecache(::String) at ./loading.jl:703
[2] _require(::Symbol) at ./loading.jl:490
I'll delete the v0.6 packages directory and try again.
The immediate culprit appears to be model.nodes[:s2_alpha].eval
with
julia> typeof(model.nodes[:s2_alpha].eval)
Mamba.##338#339
It is not being serialized as it is defined under Mamba
. Where and how is it being initialized?
julia> addprocs(2)
julia> @everywhere module Foo
type Bar
x
Bar() = new(()->1)
end
end
WARNING: deprecated syntax "type" at REPL[2]:2.
Use "mutable struct" instead.
julia> bar = Foo.Bar()
Foo.Bar(Foo.#1)
julia> remotecall_fetch(typeof, 2, bar.x)
Foo.##1#2
However, if the anonymous function is explicitly created under module Foo
julia> bar.x = eval(Foo, :(()->1))
(::#3) (generic function with 1 method)
julia> remotecall_fetch(typeof, 2, bar.x)
ERROR: On worker 2:
UndefVarError: ##3#4 not defined
deserialize_datatype at ./serialize.jl:986
handle_deserialize at ./serialize.jl:687
deserialize at ./serialize.jl:647
julia> addprocs(2);
julia> @everywhere module Foo
mutable struct Bar
x
Bar() = new(()->1)
end
make_bar() = (b=Bar(); b.x=()->1; b)
end;
julia> bar = Foo.Bar();
julia> typeof(bar.x)
Foo.##1#2
julia> remotecall_fetch(typeof, 2, bar.x); # OK
julia> bar = Foo.make_bar();
julia> typeof(bar.x)
Foo.##3#4
julia> remotecall_fetch(typeof, 2, bar.x); # OK
julia> bar.x = eval(Foo, :(()->1));
julia> typeof(bar.x)
Foo.##5#6
julia> remotecall_fetch(typeof, 2, bar.x); # Not OK
ERROR: On worker 2:
UndefVarError: ##5#6 not defined
deserialize_datatype at ./serialize.jl:986
handle_deserialize at ./serialize.jl:687
Only when the anonymous function is explicitly defined under Foo
does there seem to be a problem.
Thanks for the extensive analysis @amitmurthy what is the next step? Is this something that needs to be addressed in our package or is this something on the julialang side that could be addressed?
Both I think.
You can check if you can avoid an eval
under module Mamba
. I'll open an issue on Julialang for addressing this.
You can follow the discussion on the linked issue above.
Meanwhile, I could run rats.jl in parallel on 0.6 by eval'ing into Main.
diff --git a/src/utils.jl b/src/utils.jl
index 438dbd3..aee2a00 100644
--- a/src/utils.jl
+++ b/src/utils.jl
@@ -7,7 +7,7 @@ end
function modelfxsrc(literalargs::Vector{Tuple{Symbol, DataType}}, f::Function)
args = Expr(:tuple, map(arg -> Expr(:(::), arg[1], arg[2]), literalargs)...)
expr, src = modelexprsrc(f, literalargs)
- fx = eval(Expr(:function, args, expr))
+ fx = eval(Main, Expr(:function, args, expr))
(fx, src)
end
@@ -92,7 +92,7 @@ end
## called and will apply its error processing.
function pmap2(f::Function, lsts::AbstractArray)
- if (nprocs() > 1) & (VERSION < v"0.5-")
+ if (nprocs() > 1) #& (VERSION < v"0.5-")
@everywhere importall Mamba
pmap(f, lsts)
else
diff --git a/src/variate.jl b/src/variate.jl
index 8875ad0..1c6b91c 100644
--- a/src/variate.jl
+++ b/src/variate.jl
@@ -112,7 +112,7 @@ const BinaryScalarMethods = [
]
for op in BinaryScalarMethods
- @eval ($op)(x::ScalarVariate, y::ScalarVariate) = ($op)(x.value, y.value)
+ @eval Main ($op)(x::Mamba.ScalarVariate, y::Mamba.ScalarVariate) = ($op)(x.value, y.value)
end
const RoundScalarMethods = [
@@ -123,8 +123,8 @@ const RoundScalarMethods = [
]
for op in RoundScalarMethods
- @eval ($op)(x::ScalarVariate) = ($op)(x.value)
- @eval ($op){T}(::Type{T}, x::ScalarVariate) = ($op)(T, x.value)
+ @eval Main ($op)(x::Mamba.ScalarVariate) = ($op)(x.value)
+ @eval Main ($op){T}(::Type{T}, x::Mamba.ScalarVariate) = ($op)(T, x.value)
end
const UnaryScalarMethods = [
@@ -142,5 +142,5 @@ const UnaryScalarMethods = [
]
for op in UnaryScalarMethods
- @eval ($op)(x::ScalarVariate) = ($op)(x.value)
+ @eval Main ($op)(x::Mamba.ScalarVariate) = ($op)(x.value)
end
Currently pmap seems to be disabled for
v"0.5.0"
. I am wondering if there is a workaround, eg running the samples separately and then somehow merging. Usually run 3–5 chains and the speed gains would be significant.