JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.75k stars 5.49k forks source link

allow overloading of a.b field access syntax #1974

Closed StefanKarpinski closed 6 years ago

StefanKarpinski commented 11 years ago

Brought up here: https://github.com/JuliaLang/julia/issues/1263.

stevengj commented 11 years ago

The ability to use dots as syntactic sugar for mutator/accessor methods would be nice for lots of things. I've always appreciated this in languages that provide it, so that you can turn structure fields into more complicated abstractions without breaking the API.

toivoh commented 11 years ago

+1

JeffBezanson commented 11 years ago

I have an absolutely awesome way to implement this.

johnmyleswhite commented 11 years ago

Interested in talking about it? I know that Tom Short is really interested in having this for DataFrames, although I've come to be increasingly skeptical about the wisdom of using this feature.

stevengj commented 11 years ago

This would make calling Python code (via PyCall) significantly nicer, since currently I'm forced to do a[:b] instead of a.b.

stevengj commented 10 years ago

@JeffBezanson, any chance of having this for 0.3? Would be great for inter-language interop, both for PyCall and for JavaCall (cc @aviks).

ihnorton commented 10 years ago

@JeffBezanson if not, is there any chance you could give some direction on how you want this implemented? (I have an absolutely awesome way to implement this.)

StefanKarpinski commented 10 years ago

In my experience, there is no faster nor surer way to get Jeff to implement something than to implement a version of it that he doesn't like ;-)

JeffBezanson commented 10 years ago

The basic idea is that you implement

getfield(x::MyType, ::Field{:name}) = ...

so that you can overload it per-field. That allows access to "real" fields to keep working transparently. With suitable fallbacks getfield(::MyType, ::Symbol) also works.

The biggest issue is that modules have special behavior with respect to .. In theory, this would just be another method of getfield, but the problem is that we need to resolve module references earlier since they basically behave like global variables. I think we will have to keep this a special case in the behavior of .. There is also a bit of a compiler efficiency concern, due to analyzing (# types) * (# fields) extra function definitions. But for that we will just see what happens.

bfredl commented 10 years ago

@JeffBezanson Do you also refer to const behavior in modules? It would be useful to have a user type emulating a module and be able to tell the compiler when the result of a dynamic field lookup is infact constant. (another approach would be to start with an actual module and be able to "trap" a failed jl_get_global and inject new bindings on demand)

I would find that to be very useful in combination with #5395. Then one be able to intercept a call to a undefined function or method MyMod.newfunction(new signature) and generate bindings to a (possibly large) API on demand. This would then be cached as usual const bindings I guess.

cdsousa commented 10 years ago

Let me, a simple Julia newbie, present a little concern: I think the possibility to overload the dot operator might imply that field access "purity" is somehow lost.

The user would generally lose the knowledge if doing a.b is just an access to a reference/value or if there can be a huge function machinery being called behind. I'm not sure how that could be bad though, it is just a feeling...

On the other hand, I see that indeed this is a big wish for syntax sugar for many cases (PyCall, Dataframes...), which is perfectly understandable. Maybe it is time for .. #2614?

johnmyleswhite commented 10 years ago

I support doing this.

But the purity does have something to say for it, even if one can use names(Foo) to figure out what the real components of Foo are.

The purity argument is closely related to the main practical concern I have, which is how one handles name conflicts when the fields of the type interfere with names you might hope to use. In DataFrames, I think we'd resolve this by banning the use of columns and colindex as column names, but wanted to know what people's plan was for this.

cdsousa commented 10 years ago

I guess getfield(x::MyType, ::Field{:foo}) = ... would have to be forbidden when MyType has a field foo, otherwise the access to the real field would be lost (or a way to force access to the field would have to be available). But then getfield could only be defined for concrete types, since abstract ones know nothing about fields.

(Meanwhile, I stumbled upon this about C++.)

JeffBezanson commented 10 years ago

It's not a major problem. We can provide something like Core.getfield(x, :f) to force access to the real fields.

cdsousa commented 10 years ago

Ok, maybe I'm sold. But then defining a shortcut to Core.getfield(x, :f) (e.g., x..f) will be nice, otherwise internal code of types overloading the . for all symbols (dataframes, probably dictionaries) have to be crowded with Core.getfields.

toivoh commented 10 years ago

I'm not worried about the purity aspect - until we have this, the only code that should be using field access at all is code that belongs to the implementation of a given type. When field access is part of an api, you have to document it, as with any api. I agree that it might be handy with some shortcut syntax for core.getfield though, when writing those implementations.

cdsousa commented 10 years ago

It had already been pointed out in #4935, but let's pull it to here: dot overloading can overlap a little with classical Julian multiple dispatch if not properly used, since we can start doing

getfield(x::MyType, ::Field{:size}) = ......... for i=1:y.size .....

instead of

size(x::MyType) = .......... for i=1:size(y) ....

While the dot would be great to access items in collections (Dataframes, Dicts, PyObjects), it can somehow change the way object properties (not fields) are accessed.

nalimilan commented 10 years ago

I think one thing to consider is that if you can overload accessing field, you should also be able to overload setting a field. Else this will be inconsistent and frustrating. Are you OK to go that far?

stevengj commented 10 years ago

@nalimilan, one absolutely needs a setfield! in addition to getfield. (Similar to setindex! vs. getindex for []). I don't think this is controversial.

johnmyleswhite commented 10 years ago

Agree with @stevengj: DataFrames will definitely be implementing setfield! for columns.

lindahua commented 10 years ago

I support this.

Experience with other languages (e.g. C# and Python) does show that the dot syntax does have a lot of practical value. The way that it is implemented through specialized methods largely addresses the concern of performance regression.

It is, however, important to ensure that the inlineability of a method won't be seriously affected by this change. For example, something like f(x) = g(x.a) + h(x.b) won't become suddenly un-inlineable after this lands.

If we decide to make this happen, it is useful to also provide macros to make the definition of property easier, which might look like:

# let A be a type, and foo a property name
@property (a::A).foo = begin
    # compute the return the property value
end

# for simpler cases, this can be simplified to
@property (a::A).foo2 = (2 * a.foo)

# set property 
@setproperty (a::A).foo v::V begin
    # codes for setting value v to a property a.foo
end

Behind the scene, all these can be translated to the method definitions.

stevengj commented 10 years ago

I'm not convinced that @property (a::A).foo = is all that much easier than getproperty(a::A, ::Field{foo}) = ...

In any case, better syntactic sugar is something that can be added after the basic functionality lands.

Regarding inlining, as long the field access is inlined before the decision is made whether to inline the surrounding function, then I don't see why it would be impacted. But maybe this is not the order in which inlining is currently done?

lindahua commented 10 years ago

getproperty(a::A, ::Field{:foo}) = strikes me as there are too many colons :-) I agree that this is a minor thing, and probably we don't need to worry about that right now.

My concern is whether this would cause performance regression. I am not very clear about the internal code generation mechanism. @JeffBezanson may probably say something about this?

JeffBezanson commented 10 years ago

Field access is very low-level, so I won't do this without making sure performance is preserved.

nalimilan commented 10 years ago

After all I'm not convinced overloading fields is a good idea. With this proposal, there would always be two ways of setting a property: x.property = value and property!(x, value). If field overloading is implemented, we'll need a very strong style guide to avoid ending in a total mess where you never know in advance which solution the author has chosen for a given type.

And then there would be the question of whether fields are public or private. Not allowing field overloading would make the type system clearer: fields would always be private. Methods would be public, and types would be able to declare they implement interfaces/protocol/traits, i.e. that they provide a given set of methods. This would go against @stevengj's https://github.com/JuliaLang/julia/issues/1974#issuecomment-12083268 about overloading fields with methods to avoid breaking an API: only offer methods as part of the API, and never fields.

The only place where I would regret field overloading is for DataFrames, since df[:a] is not as nice as df.a. But that doesn't sound like it should require alone such a major change. The other use case seems to be PyCall, which may indicate that field overloading should be allowed, but only for highly specific, non-Julian use cases. But how to prevent people from misusing a feature once it's available? Hide it in a special module?

StefanKarpinski commented 10 years ago

@nalimilan, I would say that the preference should be to use x.property syntax as much as possible. The thing is that people really like this syntax – it is very pleasant. Taking such a nice syntax and specifically saying that it should only ever be used for internal access to objects seems downright perverse – "hah, this nice syntax exists; don't use it!" It seems much more reasonable to make the syntax to access private things less convenient and pretty instead of forcing APIs to use the uglier syntax. Perhaps this is a good use case for the .. operator: the private real field access operator.

I actually think that this change can make things clearer and more consistent rather than less so. Consider ranges – currently there's a sort of hideous mix of step(r) versus r.step styles out there right now. Especially since I introduced FloatRange this is dangerous because only code that uses step(r) is correct. The reason for the mix is that some properties of ranges are stored and some are computed – but those have changed over time and are in fact different for different types of ranges. It would be better style if every access was of the step(r) style except the definition of step(r) itself. But there are some steep psychological barriers against that. If we make r.step a method call that defaults to r..step, then people can just do what they're naturally inclined to do.

To play devil's advocate (with myself), should we write r.length or length(r)? Inconsistency between generic functions and methods are a problem that has afflicted Python, while Ruby committed fully to the r.length style.

toivoh commented 10 years ago

+1 for .. as Core.getfield!

nalimilan commented 10 years ago

@StefanKarpinski Makes sense, but then you'll need to add syntax for private fields, and interfaces will have to specify both methods and public fields. And indeed you need a style guide to ensure some consistency; the case of length is a difficult one, but then there is also e.g. size, which is very similar but needs a dimension index. This decision opens a can of worms...

In that case, I also support .. to access actual fields, and . to access fields, be they methods or real values.

StefanKarpinski commented 10 years ago

To play devil's advocate (with myself), should we write r.length or length(r)? Inconsistency between generic functions and methods are a problem that has afflicted Python, while Ruby committed fully to the r.length style.

The key factor that may be disambiguating for this issue is whether you want to be able to use something as a higher order function or not. I.e. the f in f(x) is something you can map over a collection, whereas the f in x.f is not (short of writing x -> x.f) – which is the same situation for all methods in single-dispatch languages.

mauro3 commented 10 years ago

Why stop at field access? What about having x.foo(args...) equivalent to getfield(x::MyType, ::Field{:foo}, args...) = ... ? Then we could have x.size(1) for size along first dimension. (not sure whether I'm fond of my suggestion, but maybe something to consider. Or probably not, as people will just write OO look-alike code?)

StefanKarpinski commented 10 years ago

That would be possible with this functionality. Which is one of the things that gives me pause. I don't have a problem with o.o. style code like that – as I said, it's fairly pleasant and people really like it – but it does introduce enough choice in ways to write things that we really need a strong policy about what you should do since you'll be very free with what you can do.

mauro3 commented 10 years ago

When I started to learn Julia, the no-dot syntax helped me a lot to mentally let go of OO-programming style. So for that reason alone, I think that my suggestion is bad.

Also, for simple overloading (i.e. just a.b sans (args...)), I agree with @nalimilan's comment above. In issue #4935 the consensus seems to be that fields should not be part of the API but only methods; consequently it seems that that issue will be closed. Having the .-overloading syntax will make it much less clear that normal-fields should not be part of the API and will probably encourage to make fields part of the API.

mauro3 commented 10 years ago

But yes, the . syntax is convenient...

How about: the single . should only be syntactic sugar for getfield(x::MyType, ::Field{:name}) = ... and field access is only through .. (i.e. what . is now).

This would allow to make the clear distinction:

Of course, this would be a breaking change.

StefanKarpinski commented 10 years ago

That's basically what I was proposing, except that . defaults to .. so it's not breaking.

mauro3 commented 10 years ago

Sorry, I should have re-read!

But I think the . not defaulting to .. might actually be nice (apart from that it is breaking), as it would force a decision on the developer about what is public API and what not. Also, if the user uses a .. than he can expect that his code may break, whereas . should not.

StefanKarpinski commented 10 years ago

That's a good point. We can go that route by having a.b default to a..b with a deprecation warning.

quinnj commented 10 years ago

From a style perspective, I think I'd much prefer to see

a = [1:10]
a.length()
a.size()

than

a.length
a.size

I think it helps preserve the idea that a function is being called instead of just a property being retrieved that is somehow stored in the type (back to the "purity" concern above). I wonder if there's a way to help ensure this kind of style so things don't get as messy as it is in some other languages.

johnmyleswhite commented 10 years ago

I don't really like

a.length()

since then I can't tell if there was a function field in the original type. If . never accesses fields, that's obviously not an issue. Otherwise, it seems confusing to me.

StefanKarpinski commented 10 years ago

A priori, I feel that we shouldn't do either a.length() or a.length. But the question is why? What makes r.step different from r.length? Is it different? If they're not different, should we use step(r) and length(r) or r.step and r.length?

mauro3 commented 10 years ago

With the semantics suggested by Stefan and the addition by me it would be clear that . always is a function call (just like + too), whereas .. is always a field access.

On the issue whether a.length, etc is a good idea: how about . access should only be used to access actual data in the type, more or less as one would use the entries of a dict. Whereas we stick with functions for the none-data properties like, size, length, step etc. Because some of them will need extra parameters and, I think, the a.size(1) type of syntax is bad.

tknopp commented 10 years ago

Here is my take on this topic:

StefanKarpinski commented 10 years ago

Please let . default to ... Julia is not known to be a boilerplate language. Lets keep it that way

I do tend to agree with this. The syntax for setting even a synthetic property would just be a.property = b, not a.property() = b.

tknopp commented 10 years ago

Sure, I just wanted to make clear why a.property() as a syntax is IMHO not nice

tknopp commented 10 years ago

Or more clearly: The important thing about the dot syntax is not that one can associate functions with types/classes but its the ability to write getters/setters in a nice way. And getters/setters are important for data encapsulation (keep the interface stable but change the implementation)

jakebolewski commented 10 years ago

This change would be great from an API designers perspective but I agree that it should come with some sort of style guide to limit any future inconsistency.

This would enable Ruby like dsl's...

amt = 1.dollar + 2.dollars + 3.dollars.20.cents 

But be prepared for java like madness:

object.propert1.property2.property3 ....
BobPortmann commented 10 years ago

Just a few thoughts:

jakebolewski commented 10 years ago

@BobPortmann I disagree. A dictionary is a container object, the API for container objects is obj[index] or obj[key]. Right now because we don't have properties in Julia, the container API is overloaded to provide this functionality in libraries like PyCall and in OpenCL. This change helps to strengthen the distinction of the container API as it will not be overloaded to provide additional functionality.

tknopp commented 10 years ago

Using a->property for private fields would be a good way to keep C hackers away from Julia ;-)

I kind of like the .. syntax.

StefanKarpinski commented 10 years ago

The a->property syntax is already spoken for – that's an anonymous function. The a..b operator has been up for grabs for a while, however. There are some cases where you want something that's dict-like but has lots of optional fields. Using getter/setter syntax for that would be nicer than dict indexing syntax.

BobPortmann commented 10 years ago

"The a->property syntax is already spoken for – that's an anonymous function."

Yes, of course. It didn't look like it without spaces around the ->.