JuliaParallel / DistributedArrays.jl

Distributed Arrays in Julia
Other
197 stars 35 forks source link

DArray should be lazy/futures #209

Open vchuravy opened 5 years ago

vchuravy commented 5 years ago

One of the current issues with DArray is that each operation is immediately synchronizing. Requiring a distributed operation to finish before we can carry on with scheduling new operations. This simplifies the design, but limits the scalability. Ideally we would want operations on DArray to be async/lazy similar to how CuArray works, and only synchronize on show and convert.

The major design issue here is to guarantee consistency. Operations need to appear to have executed in-order, even though we might want to be able to execute reads out-of-order, but we will have to deal with updating data in-place. One idea might be to use vector clocks or look into how Fractal handles this or to run a consensus protocol to establish operations that can commit.

andreasnoack commented 5 years ago

Maybe it would make more sense to follow that path in https://github.com/JuliaParallel/Dagger.jl? While it can be useful to consider a larger part of the execution graph it also vastly complicates the implementation and also makes it much harder for the user to reason about performance.

vchuravy commented 5 years ago

I consider this issue rather speculative, but we do need somewhere to issue operations without them necessarily blocking on each other.

CuArray stream interface is particularly interesting, but relies on a global order.