Open pearcemc opened 8 years ago
Do you see the same error with
S = SharedArray(Int, (3,4), init = S -> S[localindexes(S)] = myid(), pids=remotes)
r = @spawnat remotes[1] S*eye(4)
fetch(r)
Amit, thanks for the suggestion: there is progress.
julia> S = SharedArray(Int, (3,4), init = S -> S[localindexes(S)] = myid(), pids=remotes)
3x4 SharedArray{Int64,2}:
#undef #undef #undef #undef
#undef #undef #undef #undef
#undef #undef #undef #undef
julia> r = @spawnat remotes[1] S*eye(4)
RemoteRef{Channel{Any}}(5,1,115)
julia> fetch(r)
3x4 Array{Float64,2}:
5.0 5.0 6.0 6.0
5.0 5.0 6.0 6.0
5.0 5.0 6.0 6.0
It's still a bit disturbing to see undefs. I guess this is because the master process doesn't have access to that memory and there's no code pulling it from the remote. I'll try to get over it as it looks like some work can be done now!
I take it your suggestion means that the SharedArray code is designed to be invoked from the master process (can't believe I didn't try this permutation).
Much appreciated.
The previous invocation, while a bit inefficient should have worked too. And we could do a better job of "show" on unmapped workers.
The reason for the undefined ref error in your case was because S
in the invocation of S*eye(4)
on worker 3 was actually the one constructed on pids 1 and 2.
That leaves only the issue of a better show
Further issues include map(f, S::SharedArray) apparently not working for SharedArrays hosted entirely on remote machines.
The setup:
julia> @everywhere using ClusterManagers
julia> @everywhere blas_set_num_threads(12)
julia> @everywhere topo = describepids(remote=2) #from my ClusterManagers/utils branch
julia> function read2remotes(fpath::AbstractString, dims, elty::DataType, topo)
remote_shared_arrays = Dict([])
@sync begin
@async for pid in keys(topo)
remotes_on_same_host = topo[pid]
remote_shared_arrays[pid] = SharedArray(fpath, elty, dims, pids=remotes_on_same_host)
end
end
remote_shared_arrays
end
read2remotes (generic function with 1 method)
julia> dims = (2000,36)
(2000,36)
julia> rsay = read2remotes(fps[1], dims, Float32, topo)
Dict{Any,Any} with 3 entries:
36 => 2000x36 SharedArray{Float32,2}:…
12 => 2000x36 SharedArray{Float32,2}:…
24 => 2000x36 SharedArray{Float32,2}:…
The problem:
julia> map(abs, rsay[12]) #works fine on sharedarray in local memory
2000x36 Array{Float32,2}:
3.14159 1.04518 1.63132 1.94399 1.3988 1.08292 1.06984 0.535975 1.30225 … 0.035658 0.172445 4.89128 3.39776 4.79956 0.562555 0.207192 1.8894
0.336003 0.540138 1.51911 0.325294 1.2718 2.46567 0.956635 1.69363 1.17613 0.369635 0.0238586 0.827886 1.26854 0.79305 0.137688 1.25333 1.08266
julia> map(abs, rsay[36]) #fails on sharedarray on remote machine
ERROR: UndefRefError: access to undefined reference
in similar at sharedarray.jl:351
in map at sharedarray.jl:353
julia> fetch(@spawnat 36 map(abs, rsay[36])) #works remotely executed
2000x36 Array{Float32,2}:
3.14159 1.04518 1.63132 1.94399 1.3988 1.08292 1.06984 0.535975 1.30225 … 0.035658 0.172445 4.89128 3.39776 4.79956 0.562555 0.207192 1.8894
0.336003 0.540138 1.51911 0.325294 1.2718 2.46567 0.956635 1.69363 1.17613 0.369635 0.0238586 0.827886 1.26854 0.79305 0.137688 1.25333 1.08266
This shows an inconsistency between sharedarray creation - which we found had to be executed on the local machine - and sharedarray computation - which appears we need to execute on the remote host.
This is by design. You need to execute computation on the host where the shmem is mapped else we would just be pulling the entire array over the network.
Shared array creation can happen from any host. As long as all the pids specified are on the same machine.
Hi Amit, does that last comment mean the bug with respect to SharedArray creation given above is now gone on master?
It didn't exist. See my comment above - https://github.com/JuliaLang/Distributed.jl/issues/32
I think you are misusing the parallel features here. You have to send the computation to the data and not the other way around.
@jakebolewski thanks for the tip. As you can see from the examples I have tried something that works. It would help somewhat if there was further documentation: otherwise being able to create remotely hosted SharedArrays on the local machine runs somewhat counter to that model. (at least enough to confuse newbies like me).
@amitmurthy, thanks I get your comment now.
Also, I think I have something that helps with the #undefs when printing remote SharedArrays:
@everywhere function getrepresentation(S)
buf = IOBuffer()
td = TextDisplay(buf)
Base.Multimedia.display(td, S)
str = takebuf_string(buf)
return str
end
function Base.display(S::SharedArray{Float32, 2})
validpid = minimum(S.pids)
repr = @fetchfrom validpid getrepresentation(S)
print_with_color(:bold, repr)
end
Clearly if this made it into sharedarray.jl then the @everywhere would be redundant. The eltype part of the function could go, but I haven't tested that.
There is slightly different behaviour depending on whether the host is local or remote. I guess this is to do with the initialisation of julia on the remotes and not printing out so many rows/columns of a matrix. I can't find the setting however.
Would it be worth submitting this, and if sowhich git branch etc. should it be done through?
Hi @pearcemc, are/were you considering submitting a documentation PR or a change to the way SharedArray
s work? Either is fine - you can look at CONTRIBUTING.md
in the top level Julia directory for tips (I personally find it easier to read on GitHub). If that's not sufficient, I'd be happy to help you on IRC on #julia at freenode, over Gitter, or over email.
I am trying to set up SharedArrays on remote machines. I.e. shared among processes on the same machine. Unfortunately this doesn't seem to work. I am using version 0.5.0-dev+749 of julia.