JuliaParallel / DistributedArrays.jl

Distributed Arrays in Julia
Other
197 stars 35 forks source link

setindex! not defined? #155

Closed barche closed 4 years ago

barche commented 6 years ago

Hi,

Is it possible setindex! is not defined? I tried the following:

@everywhere using DistributedArrays
x = dzeros((20,), procs(), (nprocs(),))
x[20] = 3

But got

ERROR: indexing not defined for DistributedArrays.DArray{Float64,1,Array{Float64,1}}
Stacktrace:
 [1] setindex!(::DistributedArrays.DArray{Float64,1,Array{Float64,1}}, ::Int64, ::Int64) at ./abstractarray.jl:967

I could set a value using this workaround (assuming the above was on 4 processes):

@sync @spawnat 4 localpart(x)[end] = 3.0

So it seems a general setindex! would not be that difficult to implement (presumably using the same mechanism to get local indices as used in getindex), or am I missing something here?

andreasnoack commented 6 years ago

The setindex! machinery in Base assumes that it is fine to set a single element at a time. That would be very expensive for distributed arrays which is why we don't define scalar setindex!. You could argue that it would be better to provide the functionality even though it is slow but in my experience, it is better to get an error than falling back to the many AbstractArray methods in Base for distributed arrays which are extremely slow for DistributedArrays. Then there is the question about setindex! for ranges. Feel free to give it a shot but I fear it might be complicated.

barche commented 6 years ago

Aha, that makes sense indeed. I was thinking about implementing a parallel array type over MPI (using locked windows) and that would have the same problem (lock/unlock being expensive). I wonder if it would be possible to somehow queue all remote updates in a code block and send the data in one go when the block is closed. Possibly some kind of do-block method could be used to avoid forgetting to close. Some pseudo-code to make it a bit clearer:

darr = dzeros(...)
w = darraywriter(darr)
for i in some_global_index_collection
  w[i] = my_complicated_calculation(i, ...)
end
close(w) # this actually does the communication
andreasnoack commented 6 years ago

We'd probably have to do something like that to make it efficient.

If you are considering an MPI based array type then you might want to take a look at https://github.com/kpamnany/Gasp.jl which we used for the big Celeste runs last year.