JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.55k stars 5.47k forks source link

allow overloading of a.b field access syntax #1974

Closed StefanKarpinski closed 6 years ago

StefanKarpinski commented 11 years ago

Brought up here: https://github.com/JuliaLang/julia/issues/1263.

johnmyleswhite commented 9 years ago

I had missed that. Thanks for setting me straight.

tknopp commented 9 years ago

@nalimilan: Yes, the overloaded fields would only be used for the dynamic properties. This is how Jameson wants to tackle this and I think this is good. All the real getters and setters are autogenerated but still functions without all the get/set naming. The live in the GAccessor module (short G_)

mikewl commented 9 years ago

On the syntax, why not use <- for real field access? It's similar to -> in c++ which is in use for lamdas but <- is currently unused. It could be read as from the type, get the value directly.

It would leave .. unused for those wanting to use it on intervals still and would be using up an unused pair which has no other uses I can think of so far.

johnmyleswhite commented 9 years ago

Let's not use R's assignment notation for field access.

quinnj commented 9 years ago

We could possibly use -> to mirror C/C++ directly and get new syntax for anonymous functions. I’ve never much cared for the anonymous function syntax since it’s a little terse/unreadable. Maybe we could instead do something like

func (x) x^2 end

or the longer, more consistent

function (x) x^2 end

Perhaps there’d be a way to come up with a good syntax that doesn’t require using an end.

Not to change the discussion too much, but it would definitely make sense to use -> for real field access. ​

On Wed, Jan 28, 2015 at 8:49 AM, John Myles White notifications@github.com wrote:

Let's not use R's assignment notation for field access.

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/1974#issuecomment-71857083.

mauro3 commented 9 years ago

@quinnj: func (x) x^2 end already works. But it is nice to have very concise syntax for anonymous functions: map(x->x^2, 1:10)

tshort commented 9 years ago

I don't think field access needs special syntax (a unicode character as @simonbyrne suggested is okay as on option). I wouldn't want to lose x -> x^2.

elcritch commented 9 years ago

It looks like this issue is still open/pending discussion? Been interesting reading all of the various comments here about the dot operator.

Has there been any suggestions adding other new operator tokens? Using something like :> might be a nice alternative. It has parallels to |> and might have a more native Julia feel to it:

c = foo.a + foo.b
pyobj:>obj:>process(c)

While still being far easier to write than:

pyobj[:obj][:procees](c)

Or comparing:

someobj :> array |> length 
# vs
length(get_array(someobj)) 

I'm new to Julia, but I've quickly gained a strong appreciation for the multiple dispatch approach. Object oriented programming – particularly for scientific programming – makes a lot of tasks more cumbersome. I'd be worried that the OO paradigm/syntax would negatively affect the development of Julia's style.

elcritch commented 9 years ago

Or alternatively, since I forgot about string interning and/or quoting:

someobj <: field_name |> length
pao commented 9 years ago

@elcritch, <: is currently used as Julia's "is subtype of" operator, and if :> is introduced it is likely that we'll want to use it for something type-related due to that heritage.

dbeach24 commented 9 years ago

If using instance..member is a problem, here are some possibilities. Shield your eyes! It is likely that every one of these is worse:

stevengj commented 9 years ago

I honestly think that (a) .. seems good enough and (b) it doesn't really matter whether it looks nice because this will always be an obscure corner of the language. Most people will use instance.member because they will only have either a field or a getfield method, but not both.

(For that matter, most people who want to use both a member and a method will probably not even bother to learn about ... They will just define a method for foo.member and name the "real" field foo._member. Arguably, this is better style anyway — it means that when you read the type definition, it will be immediately obvious that _member is not supposed to be something you can or should access directly. This would argue for making .. something ugly and obscure like :. rather than taking up valuable punctuation real-estate.)

mbauman commented 9 years ago

I would miss the ability to use .. as an infix interval operator, but overloadable field access is a worthwhile trade-off. While I'm hesitant to mildly terrified of adding any more meanings to colon, :. doesn't seem that bad.

stevengj commented 9 years ago

Note that :. is actually valid syntax for symbol(".") right now, so it might not be good to use that. The point that .. is potentially useful is well taken; we shouldn't waste it on a syntax hardly anyone will use. I'd be perfectly happy to go with something even uglier like @. (since . is not a valid macro name nor can it begin an identifier, this doesn't seem like to conflict with anything). Again, this is going to be such an obscure corner of Julia that it is not worthwhile trying to make it pretty.

johnmyleswhite commented 9 years ago

+1 to just getting this done using .. and ignoring any potential ugliness

nanoant commented 9 years ago

Yeah, let's go for .. anyway if there could be use for .. then I think it would be a range constructor, but hey its already there with colon eg. start:stop.

hayd commented 9 years ago

One last punt: what about.: ? Is it too subtle to have a.b, a.(b), a.(:b) and a.:b ?

stevengj commented 9 years ago

@hayd, that seems too subtle and easy to use by accident.

stevengj commented 9 years ago

@ihnorton, is there any chance of resurrecting a version of #5848? We could punt on the syntax question and just use Core.getfield(x, Field{y}) to access the "real" field.

stevengj commented 9 years ago

Bikeshed about Core.getfield syntax aside, are there any substantive questions remaining?

In #5848, @tknopp suggested only making "real" field access overloadable, contrary to @JeffBezanson's suggestion that everything should be overloadable. Personally, I would be happy with making it impossible to overload real fields, except that the dynamic nature of the language will probably make that much more complicated to implement. e.g. with the "everything-overloadable" approach, if you have x::Vector{Any}, then doing x[i].y can be interpreted getfield(x[i], Field{:y}) and the dispatch system will do the right thing regardless of whether y is a real field, whereas if you only want to call getfield for "virtual" fields then the codegen will have to implement a miniature subset of the dispatch system for runtime checking of the x[i] type.

Another question was whether Module.foo should be overloadable. On the one hand, there is a certain consistency to using getfield for everything, and the abovementioned Vector{Any} example could have Module array members so we'd have to handle that case anyway. On the other hand @JeffBezanson pointed out that this could make compilation harder, and make the behavior of declarations like function Base.sum(...) hard to grok. My preference would be to make Module.foo non-overloadable, at least for now, in any case where the compiler knows it is working with a Module (i.e. not a Vector{Any}); the slight inconsistency seems worth it in order to be conservative about what gets changed.

toivoh commented 9 years ago

+1 to not allow overloading of Module.foo.

datnamer commented 9 years ago

To chime in here, one area of scientific computing where OO programming and syntax is actually superior to FP is agent based modeling. Although I miss concrete and multiple inheritance to set up agent hierarchies, the lightweight and fast abstractions and quick prototyping of Julia is amazing- Already a couple of ABM frameworks have popped up.

In ABM, the dot notation is preferable for expressing agent interactions: Agent1.dosomething(Agent2) vs dosomething(Agent1,Agent2).

This obvious isn't the biggest use case, but it would be nice to keep this syntactic sugar for thinking and coding about ABMs.

dbeach24 commented 9 years ago

I would also very much like to have this syntax available in Julia. As much as I appreciate the function oriented approach from a design perspective, method call syntax is very useful and readable in several domains. It would be great if A.b(C) was equivalent to b(A, C). On Apr 22, 2015 8:50 AM, "datnamer" notifications@github.com wrote:

To chime in here, one area of scientific computing where OO programming and syntax is actually superior to FP is agent based modeling. Although I miss concrete and multiple inheritance to set up agent hierarchies, the lightweight and fast abstractions and quick prototyping of Julia is amazing- Already a couple of ABM frameworks have popped up.

In ABM, the dot notation is preferable for expressing agent interactions: Agent1.dosomething(Agent2) vs dosomething(Agent1,Agent2).

This obvious isn't the biggest use case, but it would be nice to keep this syntactic sugar for thinking and coding about ABMs.

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/1974#issuecomment-95218555.

hayd commented 9 years ago

In ABM, the dot notation is preferable for expressing agent interactions: Agent1.dosomething(Agent2) vs dosomething(Agent1,Agent2).

Why is that better? Edit: I mean in this ABM-context specifically.

stevengj commented 9 years ago

Please, let's not get sucked into religious wars over spelling. @dbeach24, no one is proposing that a.b(c) be equivalent in Julia to b(a,c); that's not going to happen.

Overloadable dots is crucial for natural interop with other languages. That's reason enough.

dbeach24 commented 9 years ago

Subject.Verb(DirectObject)

Is fairly natural in several contexts. A lot of OO programmers are use to it, and while it is a mere reordering of function(A, B), that reordering does a lot for readability, IMO. On Apr 22, 2015 10:32 AM, "Andy Hayden" notifications@github.com wrote:

In ABM, the dot notation is preferable for expressing agent interactions: Agent1.dosomething(Agent2) vs dosomething(Agent1,Agent2).

Why is that better?

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/1974#issuecomment-95256390.

dbeach24 commented 9 years ago

I was proposing it. (Sorry, I didn't mean to start a war, nor did I know the suggestion would be unpopular.) I thought I'd seen this come up before in the forums, but did not realize that it had already been dismissed as a bad idea. May I ask why? (Can you point me to a thread?)

Thanks. On Apr 22, 2015 11:09 AM, "Steven G. Johnson" notifications@github.com wrote:

Please, let's not get sucked into religious wars over spelling. @dbeach24 https://github.com/dbeach24, no one is proposing that a.b(c) be equivalent in Julia to b(a,c); that's not going to happen.

Overloadable dots is crucial for smooth interop with other languages. That's reason enough.

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/1974#issuecomment-95266671.

toivoh commented 9 years ago

One reason not to do this is that a.b looks up b in the scope of a, while b alone looks up b in the enclosing scope. It would be very confusing if dotted access would sometimes not look up in the left side object.

Btw, it is considered a feature that functions in Julia are not looked up inside objects but in the current scope. I believe the fear that people would start to use functions looked up inside of objects is one reason that had been holding dot overloading back.

stevengj commented 9 years ago

@toivoh, any implementation of dot overloading would use the existing method dispatch, so it wouldn't change the scoping behavior. @dbeach24, the basic reason not to encourage indiscriminate a.b(c) usage is that if you have too many syntaxes to do the same thing, the language and libraries turn into a mess. It's better to pick one spelling and stick with it, and Julia's multiple dispatch favors b(a,c) because it is clearer that b is not "owned" by a — the b(a,c) method is determined equally by both a and c.

The biggest exception, of course, is for calling external libraries in languages like Python or C++, where it is nice to be able to mirror the dot syntax of the library you are calling. (So that e.g. people can translate documentation and examples directly into Julia without changing much.)

ScottPJones commented 9 years ago

Am I all wet, but wouldn't a.b(arg) mean take an anonymous function stored in field b of a, and then evaluate it with the given argument?

Sent from my iPhone

On Apr 22, 2015, at 5:06 PM, Steven G. Johnson notifications@github.com wrote:

external

pao commented 9 years ago

@ScottPJones That currently works just fine, but is generally not considered good style.

ScottPJones commented 9 years ago

I wasn’t concerned about style or not, I just thought that since it already had a meaning, that was consistent with the way Julia works (i.e., being able to store anonymous functions in fields), that it was a good argument not to try to treat a.b(arg) as if it were b(a,arg). I might have a use for having a struct (type) with members storing anonymous functions though, where I am loading functions written in Julia from a database, and then doing parse on them, and storing the functions into an object... What would be a better “Julian” style to do something like that?

Thanks!

hayd commented 9 years ago

@ScottPJones I think there is agreement these shouldn't be equivalent*.

There can be exceptions to the "style" rule, but there has to be a compelling case to use put a function in field, same as for dot overloading. I think the issue is that people shouldn't do these willy-nilly / for the sake of it/because they can.

That might be an example, but there might also be a better way (certainly it's not the only way)...

99%+ of the time it's better to dispatch on typeof(a); no function-fields, no dot overloading.

*However, I think everyone knows the second this lands there'll be a package that does just that...

tkelman commented 9 years ago

In D they even have a name "uniform function call syntax" for a.b(arg) and it's quite popular, but I think it's pretty deeply at-odds with the generic function, multiple dispatch way Julia works. If the functions in question are anonymous or completely duck-typed, then I suppose things will work, but that's pretty restrictive IMO. I don't think there's much reason to store function fields inside a composite type, except out of habit from class-based traditional OO languages. A better place to store generic functions if you're loading them from somewhere would be in modules.

But we've gotten pretty far from "substantive questions" now. I'm also in favor of being conservative with how we allow overloading getfield, not allowing getfield overloading on modules, and not worrying too much about special syntax for "true getfield."

toivoh commented 9 years ago

@stevengj: Yes, and as I was trying to say, that is one fundamental reason why it's never going to happen that a.b(c) becomes equal to b(a, c), regardless of what we do with dot overloading otherwise.

mauro3 commented 9 years ago

There was some discussion on the mailing list touching on this. From my perspective the most relevant post (by @nalimilan) to this thread is: https://groups.google.com/d/msg/julia-users/yC-sw9ykZwM/-607E_FPtl0J

ssfrr commented 9 years ago

Adding to @johnmyleswhite's comment regarding personal policy on when to use this feature - it strikes me that some HTTP ideas could be useful here, and that getfield() should not have side-effects and setfield!() should be idempotent(that is, calling it multiple times with the same value should have the same effect as calling it once). Not necessarily hard-and-fast rules that are enforced by the compiler, but usage guidelines to keep things from getting too crazy.

barche commented 8 years ago

I have posted a workaround using parametric types with pointer parameters and convert to call a custom setter when setting a field: post: https://groups.google.com/forum/#!topic/julia-users/_I0VosEGa8o code: https://github.com/barche/CppWrapper/blob/master/test/property.jl

I'm wondering if I should use this approach in my package as a workaround until setfield! overloading is available, or is it too much stress on the parametric type system?

afniedermayer commented 8 years ago

I'd like to mention one additional benefit of getfield/setfield! overloading, I hope this is the right place for this, I'm sorry otherwise. (A related topic came up on https://groups.google.com/forum/#!topic/julia-users/ThQyCUgWb_Q )

Julia with getfield/setfield! overloading would allow for a surprisingly elegant implementation of autoreload functionality in an external package. (See all the hard work that had to be put into the IPython autoreload extension https://ipython.org/ipython-doc/3/config/extensions/autoreload.html to get this functionality.) The idea of autoreload is that you can modify functions and types in external modules while working with the REPL.

TLDR: getfield/setfield! overloading, dictionaries and a package similar to https://github.com/malmaud/Autoreload.jl should do the trick.


To be more specific, imagine a package similar to Autoreload.jl that does the following.

You first create a module M.jl:

module M
type Foo
  field1::Int64
end
bar(x::Foo) = x.field1 + 1.0
end

In the REPL, you type

julia> using Autoreload2
julia> arequire("M")
julia> foo = Foo(42)

Then you change M.jl to

module M
type Foo
  field1::Int64
  field2::Float64
end
bar(x::Foo) = x.field1+x.field2

This would get autoreloaded and transformed to

# type redefinition removed as already done by Autoreload.jl
const field2_dict = Dict{UInt64,Float64}()
setfield!(x::Foo, ::Field{:field2}, value) = field2_dict[object_id(x)] = value
getfield(x::Foo, ::Field{:field2}) = field2_dict[object_id(x)]
@do_not_inline bar(x::Foo) = x.field1 + x.field2

and then in the REPL you could do

julia> foo.field2 = 3.14
julia> println(bar(foo)) # prints 45.14

Performance wouldn't be worse than with Python, so people who migrate their workflow from IPython autoreload wouldn't lose anything in terms of performance. And once you restart the REPL, you're back to full performance.

sneusse commented 8 years ago

I got tired of writing a[:field][:field2][:morestuff](b[:random_stuff]) as it is not really readable. So I wrote this little macro which works for my use-cases in 0.4 and 0.5 https://github.com/sneusse/DotOverload.jl

TL;DR A macro which transforms the AST of an expression a.b -> getMember(a, :b)

Keno commented 7 years ago

Removing from 0.6, since there's no consensus that this is a good idea and there's a conflicting proposal for what to do with dot syntax.

mauro3 commented 7 years ago

@Keno: Do you have a link to the conflicting proposal?

Keno commented 7 years ago

Don't think @StefanKarpinski has written it up yet, but I'd expect there to be a Julep about it soon.

diegozea commented 7 years ago

I found object.fieldname nicer than getter functions like fieldname(object) or get_fieldname(object) . Maybe have object.fieldname (or object$fieldname) being a call to getpublicfield (maybe with a better name) and object..fieldname being the actual getfield (private) could be a good option. In that way, types should define getpublicfield instead of getters, and trying to do object.fieldname should give an error id the field is private (It will be private if it doesn't have a definition for getpublicfield).

tknopp commented 7 years ago

I added the decision label. This issue has been discussed in length and either it has to be done or not. When reading to #5848 it seemed that @JeffBezanson @StefanKarpinski and @stevengj want this. If yes then this issue should get a milestone so that it is not forgotten. Otherwise close. In any case I think this is a change that should be done pre 1.0.

stevengj commented 7 years ago

@JeffBezanson and i were just discussing this yesterday. Tentative conclusions: (i) yes, we should have this; (ii) don't allow dot overloading for Module (which will be specially handled); (iii) don't supply any special syntax for Core.getfield (since there's no pressing need for an overloaded getfield to have the same name as a "real" field; the latter can just start with an underscore).

tknopp commented 7 years ago

@stevengj: Sounds like a reasonable plan. Could you indicate if this will be restricted to single argument or if the multi argument version a.fieldname(b) should also be supported? This will draw a conclusion to the above discussion. Furthermore it would be great to put an appropriate milestone label to this (1.0?). Thanks!

stevengj commented 7 years ago

Jeff and I didn't discuss the multi-arg case. My feeling is that we might as well support it, since you can simulate it anyway by returning a function from the no-arg case (but it isn't critical to do right away for the same reason).

WatanukiRasadar commented 7 years ago

I use a converter to cast values and validate data. like this:

abstract AbstractAge{T}
abstract AbstractPerson
type PersonAge <: AbstractAge{AbstractPerson} 
    value::Int64
end

Base.convert(t::Type{AbstractAge{AbstractPerson}}, value::Int64) =  begin
  if value < 140 && value > 0
    PersonAge(value) 
  else
     throw(ErrorException("ValueError"))
  end
end

type Person <: AbstractPerson
  age::AbstractAge{AbstractPerson}
end 

a = Person(32)
a.age = 67
JeffBezanson commented 7 years ago

Here's a fun 3-line implementation of this:

diff --git a/base/boot.jl b/base/boot.jl
index cd3ae8b..a58bb7e 100644
--- a/base/boot.jl
+++ b/base/boot.jl
@@ -266,6 +266,9 @@ Void() = nothing

 (::Type{Tuple{}})() = ()

+struct Field{name} end
+(::Field{f})(x) where {f} = getfield(x, f)
+
 struct VecElement{T}
     value::T
     VecElement{T}(value::T) where {T} = new(value) # disable converting constructor in Core
diff --git a/src/julia-syntax.scm b/src/julia-syntax.scm
index b4cb4b5..59c9762 100644
--- a/src/julia-syntax.scm
+++ b/src/julia-syntax.scm
@@ -1685,7 +1685,7 @@
     (if (and (pair? e) (eq? (car e) '|.|))
         (let ((f (cadr e)) (x (caddr e)))
           (if (or (eq? (car x) 'quote) (eq? (car x) 'inert) (eq? (car x) '$))
-              `(call (core getfield) ,f ,x)
+              `(call (new (call (core apply_type) (core Field) ,x)) ,f)
               (make-fuse f (cdr x))))
         (if (and (pair? e) (eq? (car e) 'call) (dotop? (cadr e)))
             (make-fuse (undotop (cadr e)) (cddr e))

My thinking is that a.b should actually call a projection function Field{:b}() instead of getfield, so that you get functions like x->x.a already for free. That also allows getfield to always mean low-level field access.

The above implementation completely works but is pretty hard on the compiler (sysimg +5%, which is kind of a pleasant surprise really). So this will need some specialization heuristics and some early optimizations need to be updated, but then hopefully this will be viable.