JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
44.99k stars 5.42k forks source link

allow overloading of a.b field access syntax #1974

Closed StefanKarpinski closed 6 years ago

StefanKarpinski commented 11 years ago

Brought up here: https://github.com/JuliaLang/julia/issues/1263.

mbauman commented 9 years ago

I'm really looking forward to being able to do this. Is this a big enough change to get it (or the WIP in #5848) flagged as a 0.4-project?

StefanKarpinski commented 9 years ago

Yep, it's definitely a project.

stevengj commented 9 years ago

I think most of us agree that its recommended usages should be limited, at least to start with. My feeling is that it should be initially recommended for only two usages: interoperability (with other languages, as in PyCall, and more generally for external libraries where dot notation is natural), and perhaps for objects with mutable state (since get_foo(x) and set_foo!(x, val) are ugly).

Even if we recommend it only for foreign-call interoperability, that purpose alone is enough to justify this feature in my opinion. For a new language like Julia, talking smoothly with the rest of the software universe is hugely important.

tknopp commented 9 years ago

Steven, I am not a 100% sure about the getter/setter because I fear that it will soon lead to inconsistencies but I agree with the other use case. Ontop of that we have in Gtk.jl dynamic properties which would also benefit from the syntax. My personal favorite is the enum implementation that Stefan outlined in #5842 though.

ufechner7 commented 9 years ago

Bump. What is blocking the progress on this issue? Is a decision needed, or is this issue depend on other internal changes, not yet done, or is it just coding?

StefanKarpinski commented 9 years ago

What is blocking the progress on this issue?

Someone doing the work and some hesitation about whether it's the right thing to do.

stevengj commented 9 years ago

Note that @ihnorton already made an early draft implementation at #5848. I think work has stalled primarily because a clear statement from the Julia core team is needed on whether this is a desired feature.

StefanKarpinski commented 9 years ago

I'm on board with this. @JeffBezanson seems to be on the fence.

ufechner7 commented 9 years ago

For me, having this feature, would make a transition from our large Python code base to Julia easier. To explain students, if they use Python code they need a quite different syntax then they are used to, this might become difficult.

tknopp commented 9 years ago

We had this discussion above in this thread and I still cannot see full agreement. Currently several people think that a public API is made of functions/methods while the private API is the fields of a composite type. I can see very rare exceptions from this scheme. (.U in an LU decomposition?)

This does not mean I am against this because Python access and enums are cases where this is useful. Still one can question how urgent the need here is and if it would be wise to push this in the end of a dev cycle.

stevengj commented 9 years ago

@ufechner7, I agree that the main motivation is inter-language interop. @tknopp, we are never going to get unanimous agreement on something like this. Ultimately it comes down to what @JeffBezanson and @StefanKarpinski decide.

mbauman commented 9 years ago

I think a lot of the hesitation stems from what I imagine may be Jeff's worst nightmare:

module DotOrientedProgramming
  Base.getfield(x, ::Field{:size}) = size(x)
  Base.getfield(x, ::Field{:length}) = length(x)
  ⋮
end

I would very much dislike this, too - any package that decides to misuse it like this will impose their misuse on all types in the system, including my own. This feature is very powerful and will change how Julia is written. For better and (perhaps, but hopefully not) worse.

tknopp commented 9 years ago

Yes sure Steven, that may not be properly worded from me. The point I wanted to make is that this change can have a major influence in how the language will evolve. And the "formal interface" idea that we have in another issue is also influence by making . overloadable. So yes lets @JeffBezanson and @StefanKarpinski decide. Still the question is if the decision has to be enforced now...

StefanKarpinski commented 9 years ago

For what it's worth, I've come to favor making almost all syntax overloadable and then relying on culture to resist going too hog wild with it.

ihnorton commented 9 years ago

+1. I think there is a strong philosophical (and possibly practical...) analog to call overloading here. The manual needs a section entitled Don't do stupid stuff: we won't optimize that. (of course, call overloading was partly for performance reasons -- but it is rife with potential for abuse)

sglyon commented 9 years ago
JeffBezanson commented 9 years ago

Overall, I am in favor. Potential for abuse is not my biggest worry. For me the big problems are

These problems could be solved in one stroke by using the same syntax for both, but it's nearly impossible to imagine using anything but . for modules at this point. Internally there will definitely be abstract syntax for module references; it would be frustrating if there were no nice way to expose that.

Luthaf commented 9 years ago

My two cents on this question : why not use : for qualified names ? It is already in use for something similar:

import Base: call, show, size

This would give something like

module Foo
    module Bar
        f(x) = 3*x
    end
    const a = 42
end

@assert Foo:a == 42

Foo:Bar:f(789)

Or is their already too much uses of the : symbol ? The :: symbol (C++ style) seems to be too much verbose for me.

StefanKarpinski commented 9 years ago

The : is already the most overloaded symbol in Julia, so that's not going to help, I'm afraid.

StefanKarpinski commented 9 years ago

Can we simplify the qualified naming issue by making module.name not overloadable? Since module bindings are constant, that would allow us to keep the same semantics but short-circuit all of the normal logic for qualified name lookups just as soon as it's known that the LHS of a.b is a module. I think it's pretty reasonable to not allow people to override what it means to look a name up in a module.

I rather like the a..b syntax for real field access. What's your objection to it?

StefanKarpinski commented 9 years ago

Aside: I kind of wish we had gone with ( ) for import lists like some of the functional languages. I.e.:

import Base (call, show, size)

My reason is that we could make the commas optional and allow trailing commas. It really annoys me that all of the imported names need trailing commas except the last one which cannot have one.

JeffBezanson commented 9 years ago

Yes, I was just about to mention the possibility of making a.b mean "if a is a module then do module lookup first". That might help, and we certainly don't want to override the meaning of module lookup. It does have some complexity cost though, since we then can't represent a.b as the call getfield(a,:b). It needs to be a special AST node with an implicit branch. Of course we could use an explicit branch, but I'd worry about AST bloat from that.

There doesn't seem to be an easy way out of such a huge conflict between the needs of the front end and back end.

If everybody else likes a..b I guess I can learn to live with it. It just looks to me like it means something totally different, an interval perhaps.

tknopp commented 9 years ago

I dislike a..b as well but wonder why it would be required at all. When reading this thread one gets the impression that overloading will only be used in language wrappers and dynamic use cases where the real field access is not required.

JeffBezanson commented 9 years ago

Because at some point you need to access the representation of an object in order to do anything with it. One could argue that this would be relatively rare, and so can be ugly like get_actual_field(a,:x), but this seems like too important an operation not to have syntax.

tknopp commented 9 years ago

I see that but this sounds like we seek for a syntax we want nobody to use right?

Not providing .. would be a way to say yes for dynamic use cases but no for dot-oriented programming

JeffBezanson commented 9 years ago

I don't see how that would prevent dot-oriented programming; you could still do @mbauman 's example above.

StefanKarpinski commented 9 years ago

While the a..b syntax does kind of look like an interval (I've used it as such), I just don't think that interval arithmetic needs its own input syntax – writing Interval(a,b) is just fine and there isn't much else anyone wants to use that syntax for, since it's been an operator in Julia for years now and no one is using it for much of anything. It also kind of looks like field access.

JeffBezanson commented 9 years ago

One silver lining to this is we can replace the hideous module_name with m..name. Not being able to access the fields of Module objects has been a wart.

StefanKarpinski commented 9 years ago

Yes, I was just about to mention the possibility of making a.b mean "if a is a module then do module lookup first". That might help, and we certainly don't want to override the meaning of module lookup. It does have some complexity cost though, since we then can't represent a.b as the call getfield(a,:b). It needs to be a special AST node with an implicit branch. Of course we could use an explicit branch, but I'd worry about AST bloat from that.

Could we handle this by making a.b unconditionally mean getfield(a,:b) and then making it an error to add methods to getfield that intersect the getfield(::Module, ::Field) method? It's sort of an odd way to enforce that behavior, but it would ultimately have the same effect. Then lowering could just use that fact that we know you can't do that to cheat and lower module.name to qualified name lookup.

tknopp commented 9 years ago

Ok I state it the other way around: Would anybody in this thread use .. and if yes what would be an exemplary use case? (i.e. might entirely shadowing the internal field access be ok)

JeffBezanson commented 9 years ago

@StefanKarpinski Yes, that might work. Could be another case where we want some kind of "sealed" methods.

JeffBezanson commented 9 years ago

@tknopp Accessing module..name and module..parent :) Also, just to clarify, are you advocating function-call syntax like get(obj,:field) for low-level field access?

tknopp commented 9 years ago

No I am not advocating a certain syntax. I just think it would be good to make sure why this feature is needed and what the use cases are. For the dynamic use cases it would be ok that

My question was if there are use cases where shadowing is not ok.

JeffBezanson commented 9 years ago

Yes; you might want to define pyobject.x so that x is always looked up in the pyobject's dictionary, for all x. Then a separate mechanism is needed to access the pyobject's julia fields.

tknopp commented 9 years ago

Ahhh, so its all or nothing? I somehow got the impression that one could have

type A
  c
end

Base.getfield(a::A, ::Field{:b}) = 3

a = A(1)

a.c # This still calls the field access
a.b # This calls the function
JeffBezanson commented 9 years ago

Yes, you can do that, but not all objects will. Some will want to define getfield(a::A, ::Field) to intercept all fields.

tknopp commented 9 years ago

Ok thanks now I get it. All the dynamic use cases would want getfield(a::A, ::Field) and thus need any way to call the internal fields.

Then my take is that Core.getfield is sufficient unless someone finds a practical use case where this is annoying.

johnmyleswhite commented 9 years ago

This is probably a given, but we're also going to allow overriding setfield!, right? I'd really like that for exposing mutable views into a database in which rows become types.

StefanKarpinski commented 9 years ago

Yes, that was my impression.

tknopp commented 9 years ago

Ok, IMHO whether to use .. for real field access or Core.getfield is not such a big deal. One could introduce the general feature as experimental and make this subject to change.

The question is whether this will fit into the time frame of 0.4 or not. So it #5848 close to the final implementation and the module thingy solvable?

@johnmyleswhite: I would also vote for making this symmetric and also allowing setfield!. In Gtk.jl we would use both.

nalimilan commented 9 years ago

It doesn't seem very clear what would be the rule when to use this feature and when not to use it. I see the point for PyCall, where a method/field must be dynamically looked up, and thus cannot be a Julia method/composite type (and the resulting syntax is closer to Python). But then, why use it for Gtk.jl? If it starts doing foo.bar = x instead of setbar!(foo, x), then standard Julia code will invitably start using this pattern too: is this what we want? Maybe it is, but let's be clear about that.

cdsousa commented 9 years ago

Would it be acceptable/recommended to use this feature to implement property getters and setters defined for abstract (and concrete too) types? I guess that would allow the avoidance of name clash of methods which are used to get properties from different types of different modules.

Ref.: https://github.com/JuliaLang/julia/issues/4345, https://groups.google.com/forum/#!msg/julia-users/p5-lVNlDC8U/6PYcvvsg29UJ

tknopp commented 9 years ago

@nalimilan: Gtk has a dynamic property system its not about getters/setters.

nalimilan commented 9 years ago

@tknopp Ah, OK, indeed. But for most common properties you have a (fast) getter/setter function, plus the dynamic property. So would you recommend using the getter/setter function when available, and the field overloading syntax only for properties that don't have one? Sounds fine to me -- but it's good to have a clear policy about this IMHO.

StefanKarpinski commented 9 years ago

In my view at this point (and I think we need to experiment with this a bit to figure out the right rules), f(x) is better when f makes sense as a general standalone concept like "length" while x.f should be used when f is not really independent of x. To try to fit my previous example into that, it's not really useful to have a generic step function since most vectors and collections don't have any kind of notion of step – it only makes sense when you have a range of some kind. Thus, it's ok to have x.step be the way to get the step of a range x. It's a bit of a judgement call, but I guess life is full of those.

GlenHertz commented 9 years ago

I don't like .. as it doesn't convey direct access to me. How about foo.bar. The extra dot at the end pins it to be direct access.

simonbyrne commented 9 years ago

Could also pick a unicode symbol: we still have lots of those left...

StefanKarpinski commented 9 years ago

@GlenHertz, that doesn't really work if you have to chain field accesses.

@simonbyrne, I'm generally against having anything in the core language or standard library that requires the use of Unicode. Allowing it is one thing, forcing people to use it is another entirely.

johnmyleswhite commented 9 years ago

In my view at this point (and I think we need to experiment with this a bit to figure out the right rules), f(x) is better when f makes sense as a general standalone concept like "length" while x.f should be used when f is not really independent of x.

My personal rule for using this feature is going to be: only use this for language interop or for things that are either "almost fields" or "augmented fields". For example, I might use this to update a cache that depends on the value of all of the fields in a type.

One big question I have about this feature: how does this interact with type inference? It seems like you're about to define a function getfield(x::T, s::Symbol) that produces differently typed output for different values of s. Does that only work because getfield is magical? Can you redefine the output of getfield(x, s) for fixed x and s at any point in a program? If so, how does that mesh with the inability to redefine a type?

StefanKarpinski commented 9 years ago

It seems like you're about to define a function getfield(x::T, s::Symbol) that produces differently typed output for different values of s.

That's why the plan is to express this as getfield{s}(x::T, f::Field{s}) where s is a symbol.