Closed StefanKarpinski closed 6 years ago
The ability to use dots as syntactic sugar for mutator/accessor methods would be nice for lots of things. I've always appreciated this in languages that provide it, so that you can turn structure fields into more complicated abstractions without breaking the API.
+1
I have an absolutely awesome way to implement this.
Interested in talking about it? I know that Tom Short is really interested in having this for DataFrames, although I've come to be increasingly skeptical about the wisdom of using this feature.
This would make calling Python code (via PyCall) significantly nicer, since currently I'm forced to do a[:b]
instead of a.b
.
@JeffBezanson, any chance of having this for 0.3? Would be great for inter-language interop, both for PyCall and for JavaCall (cc @aviks).
@JeffBezanson if not, is there any chance you could give some direction on how you want this implemented? (I have an absolutely awesome way to implement this.
)
In my experience, there is no faster nor surer way to get Jeff to implement something than to implement a version of it that he doesn't like ;-)
The basic idea is that you implement
getfield(x::MyType, ::Field{:name}) = ...
so that you can overload it per-field. That allows access to "real" fields to keep working transparently. With suitable fallbacks getfield(::MyType, ::Symbol)
also works.
The biggest issue is that modules have special behavior with respect to .
. In theory, this would just be another method of getfield
, but the problem is that we need to resolve module references earlier since they basically behave like global variables. I think we will have to keep this a special case in the behavior of .
. There is also a bit of a compiler efficiency concern, due to analyzing (# types) * (# fields) extra function definitions. But for that we will just see what happens.
@JeffBezanson Do you also refer to const
behavior in modules? It would be useful to have a user type emulating a module and be able to tell the compiler when the result of a dynamic field lookup is infact constant. (another approach would be to start with an actual module and be able to "trap" a failed jl_get_global
and inject new bindings on demand)
I would find that to be very useful in combination with #5395. Then one be able to intercept a call to a undefined function or method MyMod.newfunction(new signature)
and generate bindings to a (possibly large) API on demand. This would then be cached as usual const bindings I guess.
Let me, a simple Julia newbie, present a little concern: I think the possibility to overload the dot operator might imply that field access "purity" is somehow lost.
The user would generally lose the knowledge if doing a.b is just an access to a reference/value or if there can be a huge function machinery being called behind. I'm not sure how that could be bad though, it is just a feeling...
On the other hand, I see that indeed this is a big wish for syntax sugar for many cases (PyCall, Dataframes...), which is perfectly understandable. Maybe it is time for .. #2614?
I support doing this.
But the purity does have something to say for it, even if one can use names(Foo)
to figure out what the real components of Foo
are.
The purity argument is closely related to the main practical concern I have, which is how one handles name conflicts when the fields of the type interfere with names you might hope to use. In DataFrames, I think we'd resolve this by banning the use of columns
and colindex
as column names, but wanted to know what people's plan was for this.
I guess getfield(x::MyType, ::Field{:foo}) = ...
would have to be forbidden when MyType
has a field foo
, otherwise the access to the real field would be lost (or a way to force access to the field would have to be available).
But then getfield
could only be defined for concrete types, since abstract ones know nothing about fields.
(Meanwhile, I stumbled upon this about C++.)
It's not a major problem. We can provide something like Core.getfield(x, :f)
to force access to the real fields.
Ok, maybe I'm sold. But then defining a shortcut to Core.getfield(x, :f)
(e.g., x..f
) will be nice, otherwise internal code of types overloading the .
for all symbols (dataframes, probably dictionaries) have to be crowded with Core.getfield
s.
I'm not worried about the purity aspect - until we have this, the only code that should be using field access at all is code that belongs to the implementation of a given type. When field access is part of an api, you have to document it, as with any api. I agree that it might be handy with some shortcut syntax for core.getfield though, when writing those implementations.
It had already been pointed out in #4935, but let's pull it to here: dot overloading can overlap a little with classical Julian multiple dispatch if not properly used, since we can start doing
getfield(x::MyType, ::Field{:size}) = ......... for i=1:y.size .....
instead of
size(x::MyType) = .......... for i=1:size(y) ....
While the dot would be great to access items in collections (Dataframes, Dicts, PyObjects), it can somehow change the way object properties (not fields) are accessed.
I think one thing to consider is that if you can overload accessing field, you should also be able to overload setting a field. Else this will be inconsistent and frustrating. Are you OK to go that far?
@nalimilan, one absolutely needs a setfield!
in addition to getfield
. (Similar to setindex!
vs. getindex
for []
). I don't think this is controversial.
Agree with @stevengj: DataFrames will definitely be implementing setfield!
for columns.
I support this.
Experience with other languages (e.g. C# and Python) does show that the dot syntax does have a lot of practical value. The way that it is implemented through specialized methods largely addresses the concern of performance regression.
It is, however, important to ensure that the inlineability of a method won't be seriously affected by this change. For example, something like f(x) = g(x.a) + h(x.b)
won't become suddenly un-inlineable after this lands.
If we decide to make this happen, it is useful to also provide macros to make the definition of property easier, which might look like:
# let A be a type, and foo a property name
@property (a::A).foo = begin
# compute the return the property value
end
# for simpler cases, this can be simplified to
@property (a::A).foo2 = (2 * a.foo)
# set property
@setproperty (a::A).foo v::V begin
# codes for setting value v to a property a.foo
end
Behind the scene, all these can be translated to the method definitions.
I'm not convinced that @property (a::A).foo =
is all that much easier than getproperty(a::A, ::Field{foo}) =
...
In any case, better syntactic sugar is something that can be added after the basic functionality lands.
Regarding inlining, as long the field access is inlined before the decision is made whether to inline the surrounding function, then I don't see why it would be impacted. But maybe this is not the order in which inlining is currently done?
getproperty(a::A, ::Field{:foo}) =
strikes me as there are too many colons :-) I agree that this is a minor thing, and probably we don't need to worry about that right now.
My concern is whether this would cause performance regression. I am not very clear about the internal code generation mechanism. @JeffBezanson may probably say something about this?
Field access is very low-level, so I won't do this without making sure performance is preserved.
After all I'm not convinced overloading fields is a good idea. With this proposal, there would always be two ways of setting a property: x.property = value
and property!(x, value)
. If field overloading is implemented, we'll need a very strong style guide to avoid ending in a total mess where you never know in advance which solution the author has chosen for a given type.
And then there would be the question of whether fields are public or private. Not allowing field overloading would make the type system clearer: fields would always be private. Methods would be public, and types would be able to declare they implement interfaces/protocol/traits, i.e. that they provide a given set of methods. This would go against @stevengj's https://github.com/JuliaLang/julia/issues/1974#issuecomment-12083268 about overloading fields with methods to avoid breaking an API: only offer methods as part of the API, and never fields.
The only place where I would regret field overloading is for DataFrames
, since df[:a]
is not as nice as df.a
. But that doesn't sound like it should require alone such a major change. The other use case seems to be PyCall, which may indicate that field overloading should be allowed, but only for highly specific, non-Julian use cases. But how to prevent people from misusing a feature once it's available? Hide it in a special module?
@nalimilan, I would say that the preference should be to use x.property
syntax as much as possible. The thing is that people really like this syntax – it is very pleasant. Taking such a nice syntax and specifically saying that it should only ever be used for internal access to objects seems downright perverse – "hah, this nice syntax exists; don't use it!" It seems much more reasonable to make the syntax to access private things less convenient and pretty instead of forcing APIs to use the uglier syntax. Perhaps this is a good use case for the ..
operator: the private real field access operator.
I actually think that this change can make things clearer and more consistent rather than less so. Consider ranges – currently there's a sort of hideous mix of step(r)
versus r.step
styles out there right now. Especially since I introduced FloatRange
this is dangerous because only code that uses step(r)
is correct. The reason for the mix is that some properties of ranges are stored and some are computed – but those have changed over time and are in fact different for different types of ranges. It would be better style if every access was of the step(r)
style except the definition of step(r)
itself. But there are some steep psychological barriers against that. If we make r.step
a method call that defaults to r..step
, then people can just do what they're naturally inclined to do.
To play devil's advocate (with myself), should we write r.length
or length(r)
? Inconsistency between generic functions and methods are a problem that has afflicted Python, while Ruby committed fully to the r.length
style.
+1 for ..
as Core.getfield
!
@StefanKarpinski Makes sense, but then you'll need to add syntax for private fields, and interfaces will have to specify both methods and public fields. And indeed you need a style guide to ensure some consistency; the case of length
is a difficult one, but then there is also e.g. size
, which is very similar but needs a dimension index. This decision opens a can of worms...
In that case, I also support ..
to access actual fields, and .
to access fields, be they methods or real values.
To play devil's advocate (with myself), should we write
r.length
orlength(r)
? Inconsistency between generic functions and methods are a problem that has afflicted Python, while Ruby committed fully to ther.length
style.
The key factor that may be disambiguating for this issue is whether you want to be able to use something as a higher order function or not. I.e. the f
in f(x)
is something you can map
over a collection, whereas the f
in x.f
is not (short of writing x -> x.f
) – which is the same situation for all methods in single-dispatch languages.
Why stop at field access? What about having x.foo(args...)
equivalent to getfield(x::MyType, ::Field{:foo}, args...) = ...
? Then we could have x.size(1)
for size along first dimension. (not sure whether I'm fond of my suggestion, but maybe something to consider. Or probably not, as people will just write OO look-alike code?)
That would be possible with this functionality. Which is one of the things that gives me pause. I don't have a problem with o.o. style code like that – as I said, it's fairly pleasant and people really like it – but it does introduce enough choice in ways to write things that we really need a strong policy about what you should do since you'll be very free with what you can do.
When I started to learn Julia, the no-dot syntax helped me a lot to mentally let go of OO-programming style. So for that reason alone, I think that my suggestion is bad.
Also, for simple overloading (i.e. just a.b
sans (args...)
), I agree with @nalimilan's comment above. In issue #4935 the consensus seems to be that fields should not be part of the API but only methods; consequently it seems that that issue will be closed. Having the .
-overloading syntax will make it much less clear that normal-fields should not be part of the API and will probably encourage to make fields part of the API.
But yes, the .
syntax is convenient...
How about: the single .
should only be syntactic sugar for getfield(x::MyType, ::Field{:name}) = ...
and field access is only through ..
(i.e. what .
is now).
This would allow to make the clear distinction:
.
is for public API to access value-like things of type-instances..
is for field access and should generally not be used in the public APIOf course, this would be a breaking change.
That's basically what I was proposing, except that .
defaults to ..
so it's not breaking.
Sorry, I should have re-read!
But I think the .
not defaulting to ..
might actually be nice (apart from that it is breaking), as it would force a decision on the developer about what is public API and what not. Also, if the user uses a ..
than he can expect that his code may break, whereas .
should not.
That's a good point. We can go that route by having a.b
default to a..b
with a deprecation warning.
From a style perspective, I think I'd much prefer to see
a = [1:10]
a.length()
a.size()
than
a.length
a.size
I think it helps preserve the idea that a function is being called instead of just a property being retrieved that is somehow stored in the type (back to the "purity" concern above). I wonder if there's a way to help ensure this kind of style so things don't get as messy as it is in some other languages.
I don't really like
a.length()
since then I can't tell if there was a function field in the original type. If .
never accesses fields, that's obviously not an issue. Otherwise, it seems confusing to me.
A priori, I feel that we shouldn't do either a.length()
or a.length
. But the question is why? What makes r.step
different from r.length
? Is it different? If they're not different, should we use step(r)
and length(r)
or r.step
and r.length
?
With the semantics suggested by Stefan and the addition by me it would be clear that .
always is a function call (just like +
too), whereas ..
is always a field access.
On the issue whether a.length
, etc is a good idea: how about .
access should only be used to access actual data in the type, more or less as one would use the entries of a dict. Whereas we stick with functions for the none-data properties like, size
, length
, step
etc. Because some of them will need extra parameters and, I think, the a.size(1)
type of syntax is bad.
Here is my take on this topic:
a.property() = ...
feels completely wrong.a.length
is a good example, a.size(1)
not because it requires an additional argument..
default to ..
. Julia is not known to be a boilerplate language. Lets keep it that way Please let
.
default to..
. Julia is not known to be a boilerplate language. Lets keep it that way
I do tend to agree with this. The syntax for setting even a synthetic property would just be a.property = b
, not a.property() = b
.
Sure, I just wanted to make clear why a.property()
as a syntax is IMHO not nice
Or more clearly: The important thing about the dot syntax is not that one can associate functions with types/classes but its the ability to write getters/setters in a nice way. And getters/setters are important for data encapsulation (keep the interface stable but change the implementation)
This change would be great from an API designers perspective but I agree that it should come with some sort of style guide to limit any future inconsistency.
This would enable Ruby like dsl's...
amt = 1.dollar + 2.dollars + 3.dollars.20.cents
But be prepared for java like madness:
object.propert1.property2.property3 ....
Just a few thoughts:
.
syntax for Dicts with Symbols as keys. Its just nicer to use d.key
then d[:key]
. But in the end it's not critical.a->property
reads better than a..property
. But again it is not that big a deal and I don't know if it would work with julia syntax.@BobPortmann I disagree. A dictionary is a container object, the API for container objects is obj[index] or obj[key]. Right now because we don't have properties in Julia, the container API is overloaded to provide this functionality in libraries like PyCall and in OpenCL. This change helps to strengthen the distinction of the container API as it will not be overloaded to provide additional functionality.
Using a->property
for private fields would be a good way to keep C hackers away from Julia ;-)
I kind of like the ..
syntax.
The a->property
syntax is already spoken for – that's an anonymous function. The a..b
operator has been up for grabs for a while, however. There are some cases where you want something that's dict-like but has lots of optional fields. Using getter/setter syntax for that would be nicer than dict indexing syntax.
"The a->property syntax is already spoken for – that's an anonymous function."
Yes, of course. It didn't look like it without spaces around the ->
.
Brought up here: https://github.com/JuliaLang/julia/issues/1263.