JuliaParallel / DistributedArrays.jl

Distributed Arrays in Julia
Other
197 stars 35 forks source link

Potential fix when master has compute intensive work and must schedule workers #207

Open raminammour opened 5 years ago

raminammour commented 5 years ago

Fixes issue #206 , please see the issue description for an explanation of the fix.

ViralBShah commented 4 years ago

@andreasnoack Any thoughts on this?

andreasnoack commented 4 years ago

It's a good observation and a pretty simple though not super pretty fix.

I'm wondering if we with the new multithreading can now just delegate all the scheduling to a separate task that won't block while the local work is being executed. I'd like to hear @vchuravy 's thoughts.

raminammour commented 4 years ago

Looking at the code, the pattern

@sync for i in pids
    @async remotecall_fetch(**do_work**,i,...)

is common (and natural). So this may happen anywhere where **do_work** is heavy. I guess adding yield() in the correct places would work...

Or, at construction of DArray, by convention, have the id==myid() be last and preserve the invariant, pid[i] holds chunck i.

Cheers!

vchuravy commented 4 years ago

I think we need to carefully go through Distributed.jl and look at whether we can start using @spawn instead of @async, and then do the same for DistributedArrays.jl Won't be easy since a whole bunch of this code is based on cooperative tasking, and switching to parallelism will expose races.

I might be able to have a UROP look at this transition.