JuliaLang / Distributed.jl

Create and control multiple Julia processes remotely for distributed computing. Ships as a Julia stdlib.
https://docs.julialang.org/en/v1/stdlib/Distributed/
MIT License
29 stars 11 forks source link

`@distributed` fails silently when lacking `@sync` or reduction #66

Open grahamas opened 4 years ago

grahamas commented 4 years ago

For a simple example:

@distributed for (i, j) in Base.Iterators.product([1, 2], [3, 4])
    @show i + j; i + j
end

This should error, as in JuliaLang/Distributed.jl#57, but no error is reported. The code silently never runs the contents of the distributed loop. If you tack on a @sync or even put in a reduction, i.e. @distributed (*) then the error is reported properly.

This is relevant to my use-case where I set up a RemoteChannel and listen to it after running the @distributed loop asynchronously.

julia> versioninfo()
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
mbauman commented 4 years ago

When you do a t = @distributed for, you get back a task. The task's value is always nothing after all the workers come back. You can explicitly make sure no task errored with a fetch(t) — this is how you can check for both iteration space errors as well as errors inside the distributed work. In this case (#30343), I think we could probably detect it ahead of time and give a nicer error. But there may be interior errors that you'd probably want to check, too — and I'm not sure we can do anything there.