JuliaLang / Distributed.jl

Create and control multiple Julia processes remotely for distributed computing. Ships as a Julia stdlib.
https://docs.julialang.org/en/v1/stdlib/Distributed/
MIT License
20 stars 8 forks source link

Could `@everywhere` support using macros from packages that are loaded in the same block? #83

Open oxinabox opened 1 year ago

oxinabox commented 1 year ago

Consider the following (with TimeZones.jl installed in global enviroment),

using Distributed
addprocs(1)
@everywhere begin
    using TimeZones
    foo() = tz"America/New_York"
end

This should be valid as far as I know.

This failed in julia 1.6, but i think i first noticed this many many versiions ago

julia> using Distributed

julia> addprocs(1)
1-element Vector{Int64}:
 2

julia> @everywhere begin
           using TimeZones
           foo() = tz"America/New_York"
       end
ERROR: On worker 2:
LoadError: UndefVarError: @tz_str not defined
Stacktrace:
 [1] top-level scope
   @ :0
 [2] eval
   @ ./boot.jl:360
 [3] JuliaLang/julia#103
   @ /usr/local/src/julia/julia-1.6/usr/share/julia/stdlib/v1.6/Distributed/src/process_messages.jl:274
 [4] run_work_thunk
   @ /usr/local/src/julia/julia-1.6/usr/share/julia/stdlib/v1.6/Distributed/src/process_messages.jl:63
 [5] run_work_thunk
   @ /usr/local/src/julia/julia-1.6/usr/share/julia/stdlib/v1.6/Distributed/src/process_messages.jl:72
 [6] JuliaLang/julia#96
   @ ./task.jl:411
in expression starting at REPL[3]:3
Stacktrace:
 [1] sync_end(c::Channel{Any})
   @ Base ./task.jl:369
 [2] macro expansion
   @ ./task.jl:388 [inlined]
 [3] remotecall_eval(m::Module, procs::Vector{Int64}, ex::Expr)
   @ Distributed /usr/local/src/julia/julia-1.6/usr/share/julia/stdlib/v1.6/Distributed/src/macros.jl:223
 [4] top-level scope
   @ /usr/local/src/julia/julia-1.6/usr/share/julia/stdlib/v1.6/Distributed/src/macros.jl:207

In contrast

@everywhere using TimeZones
@everywhere foo() = tz"America/New_York"

works just fine.

Similar failure occurs with normal macros, e.g.

@everywhere begin
       using JuMP
       @variable(Model(), x)
end
KristofferC commented 1 year ago

I think this is just how macros work, they get expanded before any code is executed. So if code in the block defines the macros then expanding the macro itself in that block can't be done. Not sure what can be done about this.

fredrikekre commented 1 year ago

This is not related to Distributed either

julia> begin
           using TimeZones
           foo() = tz"America/New_York"
       end
ERROR: LoadError: UndefVarError: @tz_str not defined
in expression starting at REPL[1]:3
oxinabox commented 1 year ago

Not sure what can be done about this.

It might be we just need to document this. but maybe we can do better.

--

I would kind argue that this remains related to Distributed. Because people look at the begin block as being passed to the everywhere macro, which will do macro magic so it is as if they loaded it on each process at global scope (not in a begin block) Rather than thinking of it as a begin block that happens to run everywhere.

I know @everywhere already pulls out using statements from within begin blocks; to ensure they are loaded on main process before elsewhere. So we could just also execute those using statements on the worker process. So effectively we expand it into effectively two @everywhere statements? One that just has the using lines and one that is the original?

Alternatively, could we just remove the begin block on what gets run on the workers so it truely runs at global scope?

mbauman commented 1 year ago

Yes — obviously there's not much we can do about the general case without a huge reworking. But inside the @everywhere macro there could definitely be special support for this... and it's not terribly unreasonable to think that it would work differently. I think we should refocus the issue back on that particular use-case.

oxinabox commented 1 year ago

So there is a link i think we can basically take imports from https://github.com/JuliaLang/julia/blob/master/stdlib/Distributed/src/macros.jl#L198 and just generate the using from there to run first -- which it already does on the main process, just make it also for users

gybefloyd commented 1 year ago

Also, we can consider the case of meta-programming, trying to execute

pkglist = ["Statistics","Revise"] 
for pkg in pkglist
    @eval using $(Symbol(pkg))
    @eval println($pkg)
end

inside @everywhwere begin ... end does not work, which is but doing @everywhere include(file.jl) where file.jl contains the piece of code works. In all case, doing @everywhere include(file.jl) simply works. I think these differences could be emphasized in the documentation and that a new macro similar to @everywhere, or adding an option to the macro so that @everywhereexpand begin ... end or @everywhere option begin ... include for instance have the same behavior as @everywhere include(file.jl) This could avoid a problematic change of @everywhere and allow for less confusing and more convenient code when modifying an existing one, or avoid having to add files. I don't know how hard it os to implement such solutions though.

vtjnash commented 1 year ago

We could rewrite the block Expr to have a head of toplevel. That changes it to sequential execution. But note the side effect of sequential execution is that local variables and scope declarations don't flow from one line to the next (which is what allows the delayed macro expanding to work)

gybefloyd commented 1 year ago

I rapidly tried to write a macro combining @everywhere and a function similar to include_string which is used under the hood by include to mimic the behavior of @everywhere include() but passing a block expression, even changing its head to toplevel does not work. It seems that passing a code block through a macro does some "meta parsing" preventing the code to be executed "as is" and converting the quoted expression to a string and pass it just like include seems very hard or impossible. Maybe the solution would be to have a kind of macro that does not "meta parse" the code block but treat it as a raw string instead.