JuliaLang / Distributed.jl

Create and control multiple Julia processes remotely for distributed computing. Ships as a Julia stdlib.
https://docs.julialang.org/en/v1/stdlib/Distributed/
MIT License
23 stars 9 forks source link

`using` loads modules on workers but does not put exported bindings in Main #20

Open simonster opened 9 years ago

simonster commented 9 years ago

This may be intended, but it seems a bit awkward to me:

$ julia -p 1
[...]
  | | |_| | | | (_| |  |  Version 0.4.0-dev+1922 (2014-12-02 23:10 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit e4e1688* (0 days old master)
|__/                   |  x86_64-apple-darwin14.0.0

julia> using StatsBase

julia> @everywhere assert(isa(StatsBase.sample, Function))

julia> @everywhere assert(isa(sample, Function))
exception on 2: ERROR: sample not defined
 in eval at /usr/local/julia/base/sysimg.jl:7
 in anonymous at multi.jl:1395
 in anonymous at multi.jl:820
 in run_work_thunk at multi.jl:593
 in run_work_thunk at multi.jl:602
 in anonymous at task.jl:6

Obviously the workaround is @everywhere using StatsBase, but I think that using X should probably be equivalent to @everywhere using X, or if not, it should act exclusively on the main process. Instead it seems we get using X on the main process and require("X") on the workers.

malmaud commented 8 years ago

I'm going to label this as a bug since it seems clear that this couldn't have been the intended semantics.

tlnagy commented 8 years ago

I've been bitten by this loaded-but-still-out-scope bug several times. And no one I've pointed this out to has any idea why this is the way this is.

timholy commented 8 years ago

I don't think it's intentional, I just think that no one has yet decided that they care enough to fix it. You could be the one!

tlnagy commented 8 years ago

@timholy Any idea where the definition of using is located? It's kind of hard to grep for.

yuyichao commented 8 years ago

IIUC require(::Symbol)

amitmurthy commented 8 years ago

I think we need to fix this in the 0.5 timeframe itself. @everywhere using Mod or @everywhere require are problematic in themselves and responsible for https://github.com/JuliaLang/julia/issues/12381 and probably https://github.com/JuliaLang/julia/issues/16788

Which Julia/C function is called with using?

Base.require(:JSON) is equivalent of import JSON on all nodes.

timholy commented 8 years ago

It would be a bit annoying to go another release without fixing it. I once (long ago) spent a couple tens of minutes on this, but didn't get far enough to figure out how to do it or even to fully trace how using actually works.

amitmurthy commented 8 years ago

@vtjnash Where is using implemented? It does a bit more than Base.require(Mod), just unable to trace it.

simonster commented 8 years ago

https://github.com/JuliaLang/julia/blob/59b253031af87f62e7d70a7d8848cdfd4a84288b/src/toplevel.c#L450-L464

simonster commented 8 years ago

I think I can fix this pretty easily, but there seems to be some debate going on over the appropriate fix to make this consistent. It seems clear that, wherever using/import causes modules to be required, they should also alter the bindings. The question is whether using/import should require the module only on the worker they're being called on (and thus @everywhere using X would always be necessary to load modules if they will be used on workers), or whether using/import should both require the module and import the bindings on all the workers. Opinions?

amitmurthy commented 8 years ago

+1 for require the module and importing the bindings on all the workers in terms of consistency.

However, a different debate is to whether have using / import load on all workers or only on the calling process. Specifically w.r.t. plotting / visualization packages which are irrelevant on the workers. In which case we should require an explicit @everywhere using whenever we want to load a package everywhere.

tlnagy commented 8 years ago

My issue with the current behavior is two-fold:

  1. Requiring using ModuleName prior to @everywhere using ModuleName seems weird and unintuitive (see https://github.com/JuliaLang/julia/issues/16189)
  2. @everywhere using ModuleName wouldn't be a problem if it threw a specific error pointing the user to the proper solution. Something like: "ModuleName is not loaded on worker X. Consider running @everywhere using ModuleName to load it on all workers."

Specifically w.r.t. plotting / visualization packages which are irrelevant on the workers.

Gadfly and company are really heavy and would be slow to load on all workers. I agree that an explicit @everywhere using call would be better due to the performance implications.

tkelman commented 8 years ago

We'd have to see how the two versions work in practice with actual parallel usage, but I'd lean towards making using and import local-by-default unless annotated with @everywhere.

simonster commented 8 years ago

It looks like local-by-default is most people's preferred option. My main concern is https://github.com/JuliaLang/julia/issues/3680, which would mean that, in many situations, all your workers would die if you forget the @everywhere. Not a great experience, especially for people trying to do stuff in parallel for the first time.

amitmurthy commented 8 years ago

JuliaLang/julia#3680 can be partially worked around by

The other existing issue is a probable race condition with precompilation happening in parallel with @everywhere using

I'l submit a PR for the JuliaLang/julia#3680 workaround.

amitmurthy commented 7 years ago

Bump. JuliaLang/julia#3680 is closed. Implement local-by-default loading?

JeffBezanson commented 7 years ago

Yes let's try it.

oschulz commented 7 years ago

I'm curious - what's the status on this?

ceesb commented 6 years ago

This bug tripped me in 0.7.0-beta and "using X; @everywhere using X" solves it, where I didn't need to use that trick on 0.6.3 (and all the previous 0.6's). Weird..