JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.91k stars 5.49k forks source link

A more diligent name resolution? #10403

Closed ninegua closed 9 years ago

ninegua commented 9 years ago

Before I raise the problem, please help clarifying two of my understandings:

  1. Julia is strictly a lexically scoped language. That is, a variable or function name only resolves to a declaration in its enclosing (or parent or ancestor) lexical scope.
  2. Julia doesn't do any late binding. That is, once a name is resolved, it shall always point to the same object (or reference).

If both the above are true, then the following shall be their logical consequences:

  1. It is safe to always resolve a name "statically", i.e., at the compilation time of its enclosing function.
  2. If a name cannot be resolved "statically", then it is an error.

I put "statically" in quotes because Julia is certainly dynamic, but on the other hand, in the current implementation, once a Julia function is compiled, it stays compiled, and a compiled function does not attempt to resolve names at runtime.

With the above understanding, may I request the following feature to be implemented?

  1. All names (of Symbol type) shall be resolved during (or before) type inference, and it is an error to have unresolved names remaining in typed AST.

This will save a lot of trouble of implementing optimizations leveraging type information available in typed AST, because typed AST will no longer contain unresolved names, and all resolved names will only be of 3 forms: TopNode (for system definitions), GetfieldNode (for module-level definitions), SymbolNode (for local definitions). In all cases, it would be straightforward to obtain the type of a name.

JeffBezanson commented 9 years ago

There are non-constant global variables. Storage locations for such variables are resolved early, but their values (including type) can change at run time.

Given a Symbol or SymbolNode in an AST, you need to look at the function's list of local and closed variables. If the symbol does not appear there, static parameters are examined next. These are stored in the sparams field of LambdaStaticData. If the symbol isn't there either, then it is a global variable available in the module referenced by the module field of the function's LambdaStaticData object.

Using that procedure, I believe all variable references can be resolved. In case I haven't understood, it would help to provide an example of an "unresolved" name.

ninegua commented 9 years ago

I'm not saying that names cannot be resolved following the procedure you described, I'm suggesting that all names (of Symbol type) be resolved to either SymbolNode or GetfieldNode when Julia first produce typed ASTs, so that no Symbol remains in the them (except in Expr head positions).

Since type inference already can find out exact location (which module) of a name, it can save future effort of repeating this process, especially when dealing with situations where in a AST rewriting function, we don't know which module a piece of AST comes from. It is also quite troublesome to safely copy ASTs from one place to another.

The current implementation is not very consistent about resolving names. For example:

julia> f(x,y)=sin(x) * sin(y)
f (generic function with 2 methods)
julia> code_typed(f, (Array{Float32,1}, Array{Float32,1}))
1-element Array{Any,1}:
 :($(Expr(:lambda, Any[:x,:y], Any[Any[:_var0,:_var1,:_var8,:_var9,:_var10,:_var11,:_var12,:_var13,:_var14],Any[Any[:x,Array{Float32,1},0],Any[:y,Array{Float32,1},0],Any[:_var0,(Array{Float32,1},Array{Float32,1}),0],Any[:_var1,(Array{Float32,1},Array{Float32,1}),0],Any[:_var8,(Array{Float32,1},),0],Any[:_var9,(Int64,),18],Any[:_var10,Array{Float32,1},18],Any[:_var11,Array{Float32,1},18],Any[:_var12,Function,18],Any[:_var13,Array{Float32,1},18],Any[:_var14,Array{Float32,1},18]],Any[]], :(begin  # none, line 1:
        _var14 = sin(x::Array{Float32,1})::Array{Float32,1}
        _var13 = sin(y::Array{Float32,1})::Array{Float32,1}
        _var12 = (*)::Any
        _var11 = _var14::Array{Float32,1}
        _var10 = _var13::Array{Float32,1}
        _var9 = (GetfieldNode(Base.Broadcast,:broadcast_shape,Any))(_var11::Array{Float32,1},_var10::Array{Float32,1})::(Int64,)
        return broadcast!(_var12::F,(top(ccall))(:jl_new_array,$(Expr(:call1, :(top(apply_type)), :Array, Float32, 1)),$(Expr(:call1, :(top(tuple)), :Any, :Any)),Array{Float32,1},0,_var9::(Int64,),0)::Array{Float32,1},_var11::Array{Float32,1},_var10::Array{Float32,1})::Array{Float32,1}
    end::Array{Float32,1}))))

We can see a few inconsistencies above:

  1. both sin are not resolved.
  2. (*) is not resolved, and is given Any type, which doesn't match the type of _var12.
  3. broadcast! is not resolved, but broadcast_shape is resolved to GetfieldNode.

Another example would be:

f(x)=copy(x)
g(x)=begin copy!(x)=x; copy(x) end

If we dump the typed AST of both, we can see that f contains an unresolved copy! call, while the AST of g contains a fully resolved copy! as GetfieldNode. So it appears that somehow name is only resolved to avoid conflict? Why not always resolve them?

PS: I don't quite understand the decision of allowing globals to change type, though this is a different topic. In an interactive shell, newer definitions of a non-function variable should shadow previous definitions, which I believe is a different case from running code defined in a fully enclosed module. If a user really want a variable's type to be anything, he/she should explicitly say it is of Any type.

JeffBezanson commented 9 years ago

The reason we do this is efficiency. The number of symbols over all ASTs is very large. If a function contains, say, 100 function calls, we would have to create 100 extra nodes that simply repeat the information that all of these names are found in the same module as the enclosing function. Instead we can just refer to an AST, and a single pointer to a module. It's much more efficient to remember what module the AST came from than to repeat the information in every global symbol reference.

JeffBezanson commented 9 years ago

Also there is a function resolve_globals in inference.jl that adds GetfieldNodes as needed to move an AST from one place to another. With suitable arguments (e.g. an empty destination module) it would qualify everything.

ninegua commented 9 years ago

If GetfieldNode is immutable, then we can use techniques like hashconsing to reduce memory overhead. A similar situation is with type annotation, is it really necessary to introduce SymbolNode when Symbol's type can be looked up in the environment? There are always trade-offs, and more convincing arguments can be made by measuring real overheads.

JeffBezanson commented 9 years ago

SymbolNodes are used because a variable's type can be different at different points.

We keep lots of compiler data structures in memory during execution, and it's kind of a problem already. Better than hash-consing is to eliminate unnecessary objects entirely. Why is it so hard to keep track of what module an AST came from?

If you want I can write a small wrapper around resolve_globals that will add GetfieldNodes everywhere.

ninegua commented 9 years ago

Like I said, sometimes it is very difficult to figure out where a piece of code comes from. Here is a short example using macros:

julia> macro expand(expr)
         @eval function g()
           $(esc(expr))
         end
         return quote g() end
       end

julia> module A
         import ..@expand
         sin(x) = x + 1.0f
         f() = @expand sin(100)
       end

julia> A.f()
ERROR: error compiling f: unsupported or misplaced expression "escape" in function f

julia> code_typed(A.f, ())
1-element Array{Any,1}:
 :($(Expr(:lambda, Any[], Any[Any[],Any[],Any[]], :(begin  # none, line 4: # none, line 5:
        return $(Expr(:escape, :((GetfieldNode(Main,:sin,Any))(100))))
    end::Any))))

If I understand it correctly, the purpose of using esc(x) is to allow the names in the AST (value of x) resolve in the call site of applying the macro. It is apparently not working (causing an error), and sin didn't resolve to the intended A.sin function.

I admit the example was abusing the macro facility a bit. But situations like this do come up often in our use cases because names in an AST are not always accompanied by their locations.

JeffBezanson commented 9 years ago

esc only serves a purpose when it appears in the expression returned from a macro. It's used by the macro expander, and not anything else. Macros are not supposed to call eval or have side effects; they're just supposed to return a transformed expression.

Once you have the A.f object, you can find out what module the function is in.

ninegua commented 9 years ago

Actually after thinking about it more, I'd argue that resolved names should stay in AST, and resolved types should stay out of AST. Once you know the location of a variable (local or module), it is easy to look up its type. But the reverse is not as easy, because you have to walk the module hierarchy and perform a lookup in every module you visit.

Also, after inference, type castings become explicit, so it is wasteful to annotate every variable occurrences with a type.

JeffBezanson commented 9 years ago

Also, after inference, type castings become explicit, so it is wasteful to annotate every variable occurrences with a type.

I'm pretty sure this is not true. What do you mean by "type castings"? Probably some SymbolNodes are redundant, but certainly not all. A julia local variable does not always have a fixed type; different occurrences can have different types.

In fact, we already avoid creating SymbolNodes in many cases, for example for constant globals, since their types can easily be looked up just as you say.

I don't believe you ever need to walk the module hierarchy yourself. A module and a symbol are sufficient, e.g. to call eval(module, sym). And if a bare symbol occurs in an AST, that means it will be looked up in the module in that AST's LambdaStaticData. Imports and so on are handled automatically. For example you can do eval(Main, :sin) even though sin is actually defined in Base, not Main.

ninegua commented 9 years ago

Sorry, I wasn't very clear. I was talking about the cost of doing name resolution, not what Julia function one has to call to resolve names.

Like I said, if a local variable can be assigned different types within the same body, it should either be declared as an union type or Any type. Its use can be accompanied by an explicit casting (for now, you can think of a SymbolNode where the type is different than the declared type of this symbol).

If you are concerned about space usage, adding one GetfieldNode per definition (in a module) and just returning the same object in every successful lookup sounds to me like a reasonable solution. I doubt it will blow up space usage compared to the current approach. If you are willing to give it a try, we can find someone who has free cycles to do this kind of experiment, and back it up with real data.

JeffBezanson commented 9 years ago

Removing the SymbolNodes for variables that are always the same type is a good idea. I'll plan to do that.

I still don't see why the GetfieldNodes are needed. Given an AST's module, putting a GetfieldNode with that module on every symbol inside it seems equally silly to me as putting a SymbolNode with the same type on every use of a variable.

ninegua commented 9 years ago

At least in the language implementations that I'm familiar with, both MLton and GHC do this in their ASTs. Local definitions have short names, but otherwise are always qualified with where it comes from. In addition to making things more consistent (name resolution is done once and for all), I think it is also a nice trade off between space (actually quite minimum if same name objects are reused) and efficiency (having to walk module hierarchy for a lookup). Overall it reduces complexity of working with ASTs.

I understand that Julia strives to bring JIT performance, and perhaps it has not required sophisticated transformations over typed ASTs.

JeffBezanson commented 9 years ago

Is the problem that you want the information at macro-expansion time?

ninegua commented 9 years ago

Yes, we are abusing macros to quickly generate functions like the example above, and want to be safe at handling AST pieces going into it.

ninegua commented 9 years ago

Whether we should have all names resolved at macro expansion time is kind of debatable. But I do think it is a bit more hygiene, especially when used with esc.

JeffBezanson commented 9 years ago

I think you should try to find a different approach than using eval in a macro. This gets very tricky. For example the argument to a macro might contain un-expanded macro calls, and you can't be sure what those symbols will ultimately refer to without doing the expansions. Before the lowering pass, it is even difficult to tell what will be a local or global variable. Macro-expand time might simply be too early for the kind of analysis you want to do.

As an example, we might have

function f()
    @expand sin(100)
    @foo sin
end

where @foo sin can expand into sin = x, making sin a local variable. This makes it very hard for @expand to reason about what sin in its argument means.

ninegua commented 9 years ago

Points taken. Name resolution should be after lowering. Actually the perfect place is in type inference, because it has to resolve all names anyway to figure out their definitions and types.

In referring to the specific example of using esc in macros that I gave above, a solution would be to keep track of the location (call site of a macro application) in the :escape Expr, so that at the type inference stage, it knows where to look up the escaping names.

JeffBezanson commented 9 years ago

IIUC, in your example @expand macro you want g() to contain A.sin. However that assumes that sin inside f refers to the global sin, but my example shows this is not necessarily so. I believe you will always have this problem if you try to move macro argument expressions out of their original context, into the separate g() function.

ninegua commented 9 years ago

It indeed depends on what should be the intended semantics of using esc with a macro argument expression out of its original context. Here is my attempt at defining this behavior. When used in a left hand side like you have pointed out, maybe it should not resolve at all and become a local variable, maybe it should just resolve and then become an error. When used in a non-LHS position, it resolve as if it were in the original context and again two situations would follow: (1) if the name resolves to a local definition in the original context, it can either become an error or cause the local definition to become an environment variable (as if captured by a closure); (2) if the name resolves to module level definition, it becomes a GetfieldNode.

ninegua commented 9 years ago

Actually thinking about it again, in the LHS case, it should always resolve to either a local or global definition in the original context, the same as in the non-LHS case. It is just that when it resolves to a local, the original definition should now become something that can be captured by closures. Now, to implement such semantics may not be very straightforward, and it is pretty much a corner case anyway.

ninegua commented 9 years ago

I modified the example to not use @eval, but it is still not supported:

macro expand(expr)
  quote
  function g() 
    $(esc(expr))
  end
  g() 
  end
end

module A
  import ..@expand
  sin(x) = x + 1.0f
  f() = @expand sin(100)
end

julia> A.f()
ERROR: MethodError: `*` has no method matching *(::Float64, ::Function)
Closest candidates are:
  *(::Any, ::Any, ::Any)
  *(::Any, ::Any, ::Any, ::Any...)
  *(::Number, ::Bool)
  ...

 in #3#g at /mnt/home/hliu54/PSE/tmp/julia/test.jl:4
 in f at /mnt/home/hliu54/PSE/tmp/julia/test.jl:6

julia> code_typed(A.f, ())
1-element Array{Any,1}:
 :($(Expr(:lambda, Any[], Any[Any[symbol("#3#g")],Any[Any[symbol("#3#g"),Function,2]],Any[]], :(begin  # /mnt/home/hliu54/PSE/tmp/julia/test.jl, line 13: # /mnt/home/hliu54/PSE/tmp/julia/test.jl, line 3:
        $(Expr(:method, symbol("#3#g"), :((top(tuple))((top(tuple))()::Any,(top(tuple))()::Any)::Any), AST(:($(Expr(:lambda, Any[], Any[Any[],Any[],Any[]], :(begin  # /mnt/home/hliu54/PSE/tmp/julia/test.jl, line 4:
        return sin(100)::Any
    end::Any))))), false)) # line 6:
        return (#3#g::F)()::Any
    end::Any))))

If I remove the esc, it becomes more interesting:

julia> code_typed(A.f, ())
1-element Array{Any,1}:
 :($(Expr(:lambda, Any[], Any[Any[symbol("#3#g")],Any[Any[symbol("#3#g"),Function,2]],Any[]], :(begin  # /mnt/home/hliu54/PSE/tmp/julia/test.jl, line 13: # /mnt/home/hliu54/PSE/tmp/julia/test.jl, line 3:
        $(Expr(:method, symbol("#3#g"), :((top(tuple))((top(tuple))()::Any,(top(tuple))()::Any)::Any), AST(:($(Expr(:lambda, Any[], Any[Any[],Any[],Any[]], :(begin  # /mnt/home/hliu54/PSE/tmp/julia/test.jl, line 4:
        return ((top(getfield))(Main,:sin)::Any)(100)::Any
    end::Any))))), false)) # line 6:
        return (#3#g::F)()::Any
    end::Any))))

julia> A.f()
-0.5063656411097588

Apparently it didn't resolve sin to A.sin. Is it a bug? I believe macro expansions are allowed to contain local function definitions, but somehow the name resolution is messed up when splicing ASTs into their body. UPDATE: my bad, this is not a bug (the one above it is), because without esc, names should resolve with respect to the macro's environment.

JeffBezanson commented 9 years ago

1.0f is 1.0 * f. Maybe you mean 1.0f0 (Float32 1.0) instead?

ninegua commented 9 years ago

My bad. So it is really only a problem when used out of context, like @eval.

Anyway, I hope Julia developers can consider the option of having name resolved once and for all.

quinnj commented 9 years ago

@ninegua Please be careful when talking about macros on github; writing @eval without the backticks will actually notify the github user "eval" that he/she is being mentioned here. Just always put backticks around macros.

JeffBezanson commented 9 years ago

I think having names fully resolved at macro-expand time is inherently incompatible with the kind of macros we have.

ninegua commented 9 years ago

@quinnj I didn't know about this, thanks for the tip.

@JeffBezanson I wasn't saying at macro expansion time, I meant it to be during type inference, and all typed ASTs will then have either fully resolved names, or locally defined names.

JeffBezanson commented 9 years ago

Ok, how about the utility function I offered (wrapping resolve_globals) to convert an AST to that form?

JeffBezanson commented 9 years ago

Though I still don't really see the problem. I agree with your argument about type annotations, but I think the same argument applies to GetfieldNodes. Somebody else might find it very convenient to have the types marked everywhere, to save a lookup step. Given a symbol, you will have to do a lookup to find its type. By doing that, you also discover whether it is a local variable. If it's not a local variable, you know what module to look in. I just don't think this procedure is difficult or expensive.

JeffBezanson commented 9 years ago

To be clear, when you see sin in the AST, its location is specified by the module field of the LambdaStaticData the AST came from. Any time that doesn't hold, there will be a GetfieldNode already.

ninegua commented 9 years ago

Like I said, when we need to put together pieces of ASTs (as in Expr, not in LambdaStaticData) from multi-sources into a single place, We either have to carry around a module name for each of them (and thus no longer fit into Expr) or have to call resolve_globals before putting them together. If we go through multiple iterations of this, it is hard to keep track which pieces are already resolved, so we will end up calling resolve_globals unnecessarily. Do you see the complication here?

ninegua commented 9 years ago

On whether to use SymbolNode or Symbol for local variables, I actually prefer SymbolNode, if every use of it can pointer to the same (immutable) object. It trades some space in saving the effort to track typing environment. This is why I argued that after type inference and name resolution there should be no Symbol left in AST except at Expr :head position.

JeffBezanson commented 9 years ago

Ok, I understand not wanting to call resolve_globals repeatedly. With your design, it looks like we could delete that function, and maybe some other code too, which is nice. One reason it didn't seem like a problem to me is that you already need to do renaming when inlining, so an extra AST pass can't be avoided.

Also I should rename GetfieldNode to something like ModuleRef. It was originally supposed to be more general, but ended up being used only for qualifying global variables.

Let's try it and see what the cost is. I like the idea of "paying" for it by removing lots of redundant SymbolNodes. Hash-consing is not too appealing here, since there is a lot of complexity in maintaining those global data structures (especially when you add concurrency).

StefanKarpinski commented 9 years ago

Does it make sense to call them something like GlobalRef and LocalRef instead? The old "name it what you call it" rule of thumb.

JeffBezanson commented 9 years ago

I really don't want to wrap every local variable occurrence. For instance we also have the GenSym type for more efficient compiler-added SSA variables, and they are always local.

JeffBezanson commented 9 years ago

However what we could do is rename GenSym to LocalVar, and use it for all locals, replacing all symbols with indexes early on. A few LocalVars would need to be wrapped with a HasType node to handle variables with changing types.

timholy commented 9 years ago

If there's going to be some effort put into this, is there any potential synergy in addressing #1334 simultaneously? At one point there was the suggestion that we should just include complete function/file info for each line number, rather than only for the first line.

JeffBezanson commented 9 years ago

I think it's separate from line numbers, but very much related to #9023 (variable renaming).

ninegua commented 9 years ago

Thanks, Jeff! I'm trying out the latest master, and I'm glad to see the changes in typed ASTs. However, I notice the following expression is still not named resolved:

julia> f(x)= x .* 0.5
f (generic function with 1 method)

julia> code_typed(f, (Array{Int,1},))
1-element Array{Any,1}:
 :($(Expr(:lambda, Any[:x], Any[Any[Any[:x,Array{Int64,1},0]],Any[],Any[],Any[]], :(begin  # none, line 1:
        return x::Array{Int64,1} .* 0.5::Array{Float64,1}
    end::Array{Float64,1}))))

Would appreciate if . can be resolved to Base module. Actually a related question that I have is, how do you write something like methods(Base..) with correct syntax?

EDIT:

After a closer look, actually the above . is pretty printed as ., but the real AST is Main... Why is it not Base..?

JeffBezanson commented 9 years ago

I decided to print operators as just their names, since otherwise the printed ASTs are hard to read.

I admit it's awkward, but you can write Base.(:(.*)).

I'd like to resolve it to Base..*, but currently in some cases this forces bindings to be resolved too early. I'll keep working on it.

ninegua commented 9 years ago

It looks like resolution for operators is different than normal names. Here is a program that demonstrate this issue. I attempt to re-define sin and - (negation) in a separate module:

julia> module Test
         for f in (:sin, :-)
           g = QuoteNode(f)
           @eval function ($f)(x::Float64)
             getfield(Base, $g)(x)
           end
         end
       end
Warning: Method definition -(Float64) in module Base at float.jl:201 overwritten in module Test at none:5.
Test

julia> -(10.0)
ERROR: StackOverflowError:
 in - at none:5 (repeats 79996 times)

julia> Test.(:(-))(10.0)
ERROR: StackOverflowError:
 in - at none:5 (repeats 79996 times)

julia> sin(10.0)
-0.5440211108893698

julia> Test.sin(10.0)
-0.5440211108893698

julia> code_typed(Test.sin, (Float64,))
1-element Array{Any,1}:
 :($(Expr(:lambda, Any[:x], Any[Any[Any[:x,Float64,0]],Any[],Any[],Any[]], :(begin  # none, line 5:
        return ((Test.getfield)(Test.Base,:sin))(x::Float64)
    end))))

julia> code_typed(Test.(:(-)), (Float64,))
1-element Array{Any,1}:
 :($(Expr(:lambda, Any[:x], Any[Any[Any[:x,Float64,0]],Any[],Any[],Any[]], :(begin  # none, line 5:
        return ((Test.getfield)(Test.Base,:-))(x::Float64)
    end))))

As seen above, the definition of - in Test module overrides top-level definition (which then creates an infinite loop when called), but this is not the case for sin (which is what I'd expect).

An additional question is, why isn't getfield resolved to top(getfield)? There is obviously no function named getfield in either Test module, or Base.

JeffBezanson commented 9 years ago

This is because modules do importall Base.Operators by default. This behavior is slightly controversial, but it's there because people basically always want to extend operators rather than define them anew.

The AST contains GlobalRef(Test, :getfield) because at a certain point, all we know is that the variable reference occurs in the Test module. We don't yet always know where this variable will ultimately resolve to. At some point, evaluating that GlobalRef will give you the correct value, however doing so also forces the binding to be resolved. There is a small possibility that the module was going to do e.g. a using statement that would have changed where the variable resolves to.

ninegua commented 9 years ago

Thanks for the explanation. I'd appreciate if there is a way to turn off importall for operators so that we can define operators that reside locally in a module rather than overriding existing definitions in Base.

On the second issue, I believe here is what you meant:

julia> module Test
         module B
           export g
           function g(x)
             println(x, " world!")
           end
         end
         module A
           function f()
             g("hello")
           end
           println(code_typed(A.f, ()))
           using ..B
         end
       end 
Any[:($(Expr(:lambda, Any[], Any[Any[],Any[],Any[],Any[]], :(begin  # none, line 10:
        return (Test.A.g)("hello")
    end))))]
Test

julia> Test.A.f()
hello world!

At printing, the AST of Test.A.f contains a call to Test.A.g, which is yet to be defined. Once the code for module Test.A is completely loaded, we can safely resolve Test.A.g to Test.B.g.

So, am I correct to assume that once a module is completely loaded, evaluating all GlobalRef in the ASTs of this module should give me the final resolution? Do we have a function that does this? I'm thinking something of type GlobalRef -> Union(GlobalRef, TopNode) would do.

quinnj commented 9 years ago

Paul,

Check out baremodules: http://docs.julialang.org/en/latest/manual/modules/#default-top-level-definitions-and-bare-modules For cases where you need just the bare minimum.

-Jacob

On Tue, Jul 14, 2015 at 1:18 PM, Paul Liu notifications@github.com wrote:

Thanks for the explanation. I'd appreciate if there is a way to turn off importall for operators so that we can define operators that reside locally in a module rather than overriding existing definitions in Base.

On the second issue, I believe here is what you meant:

julia> module Test module B export g function g(x) println(x, " world!") end end module A function f() g("hello") end println(code_typed(A.f, ())) using ..B end end Any[:($(Expr(:lambda, Any[], Any[Any[],Any[],Any[],Any[]], :(begin # none, line 10: return (Test.A.g)("hello") end))))] Test

julia> Test.A.f() hello world!

At printing, the AST of Test.A.f contains a call to Test.A.g, which is yet to be defined. Once the code for module Test.A is completely loaded, we can safely resolve Test.A.g to Test.B.g.

So, am I correct to assume that once a module is completely loaded, evaluating all GlobalRef in the ASTs of this module should give me the final resolution? Do we have a function that does this? I'm thinking something of type GlobalRef -> Union(GlobalRef, TopNode) would do.

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/10403#issuecomment-121346740.

ninegua commented 9 years ago

@quinnj Thanks for the suggestion, Here is what I tried:

julia> baremodule Test
         import Base, Base.warn, Base.call, Base.@eval
         export sin, -
         eval(x) = Core.eval(Test, x)
         for f in (:sin, :-)
           g = QuoteNode(f)
           @eval function ($f)(x::Float64)
             warn("in Test!")
             getfield(Base, $g)(x)
           end
         end
       end
Test

julia> importall Test
Warning: ignoring conflicting import of Test.- into Main

julia> sin(10.0)
WARNING: in Test!
-0.5440211108893698

julia> -(10.0)
-10.0

julia> Test.sin(10.0)
WARNING: in Test!
-0.5440211108893698

julia> Test.(:(-))(10.0)
WARNING: in Test!
-10.0

So I've managed to define a - operator in Test module that doesn't override its default definition in Base, but neither can I import it and run it without specifically writing Test.(:(-)). Any suggestion in getting around this?

ninegua commented 9 years ago

I'm trying eval(GlobalRef(...)), it's close to what I wanted. Let me rephrase what I want to achieve:

  1. Being able to define and use both operators and functions local to a module that override definitions in Base.
  2. Being able to locate where a definition comes from.

Both points can be illustrated by the following program:

julia> baremodule Test
                import Base, Base.warn, Base.call, Base.@eval, Base.@noinline
                export sin, -, abc
                eval(x) = Core.eval(Test, x)
                for f in (:sin, :-)
                  g = QuoteNode(f)
                  @eval @noinline function ($f)(x::Float64)
                    warn("in Test!")
                    getfield(Base, $g)(x)
                  end
                end
                abc = 10
              end
Test

julia> module Prog
                importall ..Test
                f() = sin(0.5) + abc
                g() = -(0.5)
              end
Warning: ignoring conflicting import of Test.- into Prog
Prog

julia> code_typed(Prog.g, ())
1-element Array{Any,1}:
 :($(Expr(:lambda, Any[], Any[Any[],Any[],Any[],Any[]], :(begin  # none, line 4:
        return (Base.box)(Base.Float64,(Base.neg_float)(0.5))::Float64
    end::Float64))))

julia> code_typed(Prog.f, ())
1-element Array{Any,1}:
 :($(Expr(:lambda, Any[], Any[Any[],Any[],Any[],Any[]], :(begin  # none, line 3:
        return (Prog.sin)(0.5) + Prog.abc
    end))))

We can see that sin is resolved to Prog.sin, but - is somehow resolved to Base.(:(-)) (and inlined). I guess the warning indicates something is preventing this from happening. I guess the solution is to make Prog also a baremodule, but this would discourage people from using operators defined in modules other than Base (but with the same name as those in Base). Is there a possible fix for this?

What I had hoped for was to resolve both to module Test, but as explained by Jeff, I now understand it cannot be done until compile time. That being said, I can always find the real location (of a GlobalRef to a function) like below:

julia> eval(code_typed(Prog.f, ())[1].args[3].args[2].args[1].args[1]).env.defs.func.code.module
Test

julia> eval(code_typed(Prog.f, ())[1].args[3].args[2].args[1].args[1]).env.defs.func.code.name
:sin

For non-functions such as the use of abc in function Prog.f, I'd appreciate a way to find their exact locations.

The fact that Julia's name resolution is only "final" until compile time is also a bit surprising. I don't know if there is a solution to make everybody happy, but the following demonstrates what is going on:

julia> module A
         export abc
         abc = 10
       end
A

julia> 

julia> module B
         using ..A
         f() = abc
         println(f())
         abc = 20
       end
10
Warning: imported binding for abc overwritten in module B
B

julia> 

julia> module C
         using ..A
         f() = abc
         abc = 20
         println(f())
       end
20
C

I find it intriguing that there is a warning for module B, but not for module C.

ninegua commented 9 years ago

Actually here is what happens when I use using instead of importall:

julia> module Prog
         using ..Test
         f() = sin(0.5) + abc
       end    
Warning: using Test.- in module Prog conflicts with an existing identifier.
Prog

julia> Prog.f()
Warning: both Test and Base export "sin"; uses of it in module Prog must be qualified
ERROR: UndefVarError: sin not defined
 in f at none:3

If I understand it correctly, the intended use of import and importall is the ability to extend functions that are being imported. But what happens when the imported names are a conflict with existing names? This wasn't explained in documentation.

The same goes to using. The above gives a warning message and eventually an error when Prog.f is called. However, changing using to importall would make the same call successful. I'd appreciate some clarification on this difference in behavior, when both are dealing with name conflicts between what was already defined in current scope and what is defined in the module to be used or imported.

ninegua commented 9 years ago

Just want to add a comment that I really do think the name resolution should be final at type inference time (instead of compile time in codgen), otherwise we risk the danger of resolving it to a different type. The follow shows this bug:

julia> module A
         export abc
         @noinline abc(x::Any) = "hello"
       end
A

julia> 

julia> module B
         import ..A.abc
         f() = begin x = abc(1) end
         println(code_typed(f,()))
         @noinline abc(x::Int) = x
       end
Any[:($(Expr(:lambda, Any[], Any[Any[Any[:x,ASCIIString,18]],Any[],Any[ASCIIString],Any[]], :(begin  # none, line 3: # none, line 3:
        GenSym(0) = (B.abc)(1)::ASCIIString
        x = GenSym(0)
        return GenSym(0)
    end::ASCIIString))))]
B

julia> 

julia> B.f()

signal (11): Segmentation fault

One may argue that the above is not a valid program because function abc is being extended after I force the type inference on function f, and the inferred String type caused a conflict with the actual method B.abc that is being called when f is run. However, the following works fine:

julia> module A
         export abc
         @noinline abc(x::Any) = "hello"
       end
A

julia> 

julia> module B
         import ..A.abc
         f() = begin x = abc(1) end
         f()
         @noinline abc(x::Int) = x
       end
B

julia> 

julia> B.f()
"hello"

So instead of merely doing type inference, I forced compilation of f, then it stays to permanently resolve function abc to the method defined in module A.

I might be splitting the hair here, but the above illustrates a name resolution gap between inference and codegen. The thing caused the crash is because type inference is making an assumption that the call to abc resolves to module A when it infers its return type, but the generated code still has GlobalRef(B, :abc), which later during codegen locates a method defined in module B. This is a mismatch, and I would argue it really should produce GlobalRef(A, :abc) because that is all the knowledge that the inferencer has at the time when it works on function f. This would also match the behavior in the second example, so all would be fine.

I understand that forcing inference and compilation can sometimes result in surprising behaviors, but at the same time I hope I've just made the case that name resolution should be final at type inference time, so that GlobalRefs would never be resolved to different things when being compiled later. By "final", I mean it resolves to the exact place where the matching method is defined. This is where inferencer finds the type, and where codegen finds the method.