JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
44.97k stars 5.42k forks source link

allow overloading of a.b field access syntax #1974

Closed StefanKarpinski closed 6 years ago

StefanKarpinski commented 11 years ago

Brought up here: https://github.com/JuliaLang/julia/issues/1263.

stevengj commented 10 years ago

As a style guideline, how about recommending that property(x) be used for read-only properties and that x.property be used for read/write properties?

For writable properties, x.foo = bar is really much nicer than set_foo!(x, bar).

tknopp commented 10 years ago

Having foo(x) for reading and x.foo for writing is quite confusing. Actually this is what properties make so appealing. Having the same syntax for read and write access, i.e. the most simple syntax one can get (for getters and setters)

Regarding style there is the big question whether we want to have both x.length and length(x) if this feature gets implemented or whether the later form should be deprecated and removed.

My opinion is that we should only have one way of doing it and only use x.length in the future. And regarding style I think its quite simple. Everything that is a simple property of a type should be implemented using the field syntax. Everything else with functions. I have used properties in C# a lot and rarely found a case where I was unsure whether something should be a property or not.

JeffBezanson commented 10 years ago

I'm against changing a randomly-chosen set of 1-argument functions to x.f syntax. I think @mauro3 made a good point that doing this obscures the nature of the language.

a.b is, at least visually, kind of a scoping construct. The b need not be a globally-visible identifier. This is a crucial difference. For example, matrix factorizations with an upper part have a .U property, but this is not really a generic thing --- we don't want a global function U. Of course this is a bit subjective, especially since you can easily define U(x) = x.U. But length is a different kind of thing. It is more useful for it to be first class (e.g. map(length, lst)).

StefanKarpinski commented 10 years ago

Here are the guidelines I would suggest. The foo.bar notation is appropriate when:

  1. foo actually has a field named bar. Example: (1:10).start.
  2. foo is an instance of a group of related types, some of which actually have a field named .bar; even if foo doesn't actually have a bar field, the value of that field is implied by its type. Examples: (1:10).step, (0.1:0.1:0.3).step.
  3. Although foo doesn't explicitly store bar, it stores equivalent information in a more compact or efficient form that is less convenient to use. Example: lufact(rand(5,5)).U.
  4. You are emulating an API from another like Python or Java.

It may make sense for the bar property to be assignable in cases 1 and 3 but not 2. In case 2, since you cannot change the type of a value, you cannot mutate the bar property that is implied by that type. In such cases, you probably want to disallow mutation of the bar property of the other related types, either by making them immutable or by explicitly making foo.bar = baz an error.

stevengj commented 10 years ago

@tknopp, I wasn't suggesting using x.foo for writing and foo(x) for reading. My suggestion was that if a property is both readable and writable, then probably you want to both read and write it with x.foo.

tknopp commented 10 years ago

@StefanKarpinski: But isn't length a case of 3. where the sizes are whats usually stored and length is the product of the sizes?

I see Jeffs point though that this change would make these functions not first class anymore.

@stevengj: I see. Sorry for confusing that.

StefanKarpinski commented 10 years ago

@tknopp – the length is derived from the sizes, but not equivalent to them. If you know the sizes you can compute the length but not vice versa. Of course, this is a bit of a blurry line. The main reason this is acceptable for lufact is that we haven't figured out a better API than that. Another approach would be to define upper and lower generic functions that give the upper-triangular and lower-triangular parts of general matrices. However, this approach doesn't generalize to QR factorizations, for example.

JeffBezanson commented 10 years ago

It's telling that there are only a few cases that really seem to ask for this syntax: pycall, factorizations, and maybe dataframes. I'm quite worried about ending up with a random jumble of f(x) vs. x.f; it would make the system much harder to learn.

mauro3 commented 10 years ago

Doesn't point 1 of @StefanKarpinski's list mean that any field of a type automatically belongs to public API?

At the moment I can tell what is the public API of a module: all exported functions and types (but not their fields). After this change, it would not be possible to tell which fields are supposed to belong to the public API and which not. We could start naming private fields a._foo or so, like in python, but that seems not so nice.

johnmyleswhite commented 10 years ago

Personally I think the DataFrames case is a little superfluous. If we do this, I'll add the functionality to DataFrames, but I find the loss of consistency much more troubling than saving a few characters.

tknopp commented 10 years ago

I would also not make the decision dependent on DataFrames, PyCall (and Gtk wants it also). Either we want it because we think that fields should be part of a public interface (because it "looks nice") or we don't want it.

aviks commented 10 years ago

... pycall ...

and JavaCall

johnmyleswhite commented 10 years ago

Since the main use case for this seems to be interactions with non-Julia systems, what about using the proposed .. operator instead of overloading .?

quinnj commented 10 years ago

I wonder if a simpler solution here is a more general hat-tip to OO:

#we already do
A[b] => getindex(A,b)
#we could have
A.b(args...) => b(A, args...)
# while
A..b => getfield(A,::Field{:b})
# with default
getfield(A, ::Field{:b}) = getfield(A, :b)

It seems like this would allow JavaCall/PyCall to do method definitions "in" classes, while also allowing a general style if people want to have some OO type code, though it's very transparent A.b() is just a rewrite. I think this would be very natural for people coming from OO. Also having the new getfield with A..b to allow overloading there, though overloading here is strongly discouraged and only to be used for field-like/properties (I suspect it wouldn't be used very widely due to the slight scariness of overloading getfield(A, ::Field{:field}).

StefanKarpinski commented 10 years ago

@mauro3:

Doesn't point 1 of @StefanKarpinski's list mean that any field of a type automatically belongs to public API?

That was a list of when it's ok to use foo.bar notation, not when it's necessary. You can disable the foo.bar notation for "private" fields, which would then only be accessible via foo..bar.

@karbarcca: I'm not super clear on what you're proposing here.

ihnorton commented 10 years ago

fwiw, I'm a fan of taking the consenting-adults-by-convention approach and making . fully overloadable. I think the double-dot proposal would lead to more confusion rather than less.

StefanKarpinski commented 10 years ago

@ihnorton – as in you're against using a..b as the (unoverloadble) core syntax for field access or against using a..b for the overloadable syntax?

nolta commented 10 years ago

One of julia's best features is its simplicity. Overloading x.y feels like the first step on the road to C++.

mauro3 commented 10 years ago

@StefanKarpinski but then this would mean quite a shift in paradigm from default private fields to default public fields.

A realization I just had, probably this was clear to others all along. Full OO-style programming can be done with the basic .-overloading (albeit it's ugly). Defining

getfield(x::MyType, ::Field{:foo}) = args -> foofun(x, args...) # a method, i.e. returns a function
getfield(x::MyType, ::Field{:bar}) = x..bar+2                  # field access, i.e. returns a value

then x.foo(a,b) and x.bar work. So the discussion on whether x.size(1) should be implemented or only x.size is moot.

ihnorton commented 10 years ago

@StefanKarpinski against generally overloadable a..b and lukewarm about a..b -> Core.getfield(a,b).

JeffBezanson commented 10 years ago

I do start to see the need for another operator here, but a..b is not quite convincing. Needing two characters feels very... second class. Maybe a@b, a$b, or a|b (bitwise operators are just not used that often). An outside possibility is also ab`, which the parser could probably distinguish from commands.

I'd be ok with using the "ugly" operator for primitive field access. I think experience has shown that since it is a concrete operation it is rarely used, and indeed somewhat dangerous to use.

quinnj commented 10 years ago

I'm suggesting allowing simulating OO single dispatch by the convention/rewriting:

type Type end
# I can define methods with my Type as 1st argument
method(T, args...) = # method body
t = Type()
# then I can call that method, exactly like Java/Python methods, via:
t.method(args...)
# so
t.method(args...) 
# is just a rewrite to
method(t, args...)

The justification here is we already do similar syntax rewrites for getindex/setindex!, so let's allow full OO syntax with this. That way, PyCall and JavaCall don't have to do

my_dna[:find]("ACT")
# they can do
my_dna.find("ACT")
# by defining the appropriate find( ::PyObject, args...) method when importing modules from Python/Java

I like this because it's a fairly clear transformation, just like getindex/setindex, but allows simulating a single dispatch OO system if desired, particularly for OO language packages.

I was then suggesting the use of the .. operator for field access, with the option to overload. The use here would be allowing PyCall/JavaCall to simulate field access by overloading calls to .., allowing DataFrames to overload .. for column access, etc. This would also be the new default field access in general for any type.

JeffBezanson commented 10 years ago

I do have a soft spot for pure syntax rewrites. It's arguably a bad thing that you can write a.f(x) right now and have it work but mean something confusingly different than most OO languages.

Of course the other side of that coin is horrible style fragmentation, and the fact that a.f has nothing in common with a.f(), causing the illusion to break down quickly.

carlobaldassi commented 10 years ago

One of julia's best features is its simplicity. Overloading x.y feels like the first step on the road to C++.

Same feeling here. I was considering, if the actual need for this is really for a limited number of interop types, what about only making it valid if explicitly asked in the type declaration? E.g. an additional keyword besides type and immutable could be ootype or something.

quinnj commented 10 years ago

and the fact that a.f has nothing in common with a.f(), causing the illusion to break down quickly.

Can you clarify what this means @JeffBezanson?

JeffBezanson commented 10 years ago

I'd expect that a.f is some kind of method object if a.f() works.

quinnj commented 10 years ago

Ah, got it. Yeah, you definitely wouldn't be able to do something like map(t.method,collection).

simonbyrne commented 10 years ago

I'm going to agree with @mauro3 that by allowing obj.method(...), there is a risk that new users may just see julia as another object-oriented language trying to compete with python, ruby etc., and not fully appreciate the awesomeness that is multiple-dispatch. The other risk is that standard oo style then become predominant, as this is what users are more familiar with, as opposed to the more julian style developed so far.

Since the use case, other than DataFrames, is restricted to inter-op with oo languages, could this just all be handled by macros? i.e. @oo obj.method(a) becomes method(obj,a)?

mauro3 commented 10 years ago

@karbarcca this would mean that automatically everything could be written in two ways:

x = 3
x.sin()
sin(x)
x + 2
x.+(2) # ?!
cdsousa commented 10 years ago

@karbarcca https://github.com/JuliaLang/julia/issues/1974#issuecomment-38830330

t.method(args...)

is just a rewrite to

method(t, args...)

That would not be necessary to PyCall since the overloadable dot could just be used to call pyobj[:func] by pyobj.func. Then pyobj.func() would be in fact (pyobj.func)() .

stevengj commented 10 years ago

Rewriting a.foo(x) as foo(a, x) would not solve the problem for PyCall, because foo isn't and cannot be a Julia method, it is something I need to look up dynamically at runtime. I need to rewrite a.foo(x) as getfield(a, Field{:foo})(x) or similar [or possibly as getfield(a, Field{:foo}, x)] so that my getfield{S}(::PyObject, ::Type{Field{S}}) can do the right thing.

cdsousa commented 10 years ago

@JeffBezanson https://github.com/JuliaLang/julia/issues/1974#issuecomment-38837755

I do start to see the need for another operator here, but a..b is not quite convincing. Needing two characters feels very... second class

I would say that, on the other hand, .. is typed much more quickly than $, @ or | as no shift key needs to be pressed, and while being two characters the finger stays on the same key :smile:

simonbyrne commented 10 years ago

@stevengj Ah, I see. But my point still stands, that the rewriting could be done with a macro.

aviks commented 10 years ago

For JavaCall, I actually only need essentially a unknownProperty handler. I dont actually need to rewrite or intercept existing property read or write. So would a rule that "a.x gets re-written to getfield(a, :x) only when x is not an existing property" help keep things sane?

stevengj commented 10 years ago

@simonbyrne, requiring a macro would defeat the desire for clean and transparent interlanguage calling. Also, it would be hard to make it work reliably. For example, suppose that you have a type Foo; p::PyObject; end, and for an object f::Foo you want to do foo.p.bar where bar is a Python property lookup. It's hard to imagine a macro that could reliably distinguish the meanings of the two dots in foo.p.bar.

Honestly, I don't see the big deal with style. High-quality packages will imitate the style of Base and other packages where possible, and some people will write weird code no matter what we do. If we put dot overloading in a later section of the manual, and recommend its use only in a few carefully selected cases (e.g. inter-language interoperability, read/write properties, maybe for avoiding namespace pollution for things like factor.U, and in general as a cleaner alternative to foo[:bar]), then I don't think we'll be overrun with packages using dot for everything. The main thing is to decide what we will use and recommend this for, and probably we should keep the list of recommended uses very short and only extend it as real-world needs arise.

We're not adding super-easy OO-like syntax like type Foo; bar(...) = ....; end for foo.bar(...), so that will limit temptation for newbies too.

StefanKarpinski commented 10 years ago

I'm basically in full agreement with @stevengj here. I like a..b for real field access because it

  1. looks similar to a.b
  2. is less convenient, as it should be
  3. is only slightly less convenient
  4. has no existing meaning and we haven't found any compelling use for it in over a year
  5. isn't horrifically weird like ab`
jakebolewski commented 10 years ago

With this change and possibly (https://github.com/JuliaLang/julia/issues/2403) will nearly all of Julia's syntax be overloadable? (The ternary operator is the only exception I can think of) That almost all syntax is lowered to overloadable method dispatch seems to be a strongly unifying feature to me.

StefanKarpinski commented 10 years ago

I agree that it's actually kind of a simplification. The ternary operator and && and || are really control flow, so that's kind of different. Of course that kind of argues against making a..b the real field access since then that would be the only non-overloadable syntax. But I still think it's a good idea. Consistency is good but not paramount for its own sake.

StefanKarpinski commented 10 years ago

Oh, there's also function call which is not overloadable. So basic I forgot about it.

jakebolewski commented 10 years ago

That is what issue #2403 addresses.

StefanKarpinski commented 10 years ago

Yep. But this is a lot closer to happening than that is.

JeffBezanson commented 10 years ago

The only fly in the ointment for me here is that it would be really nice to use the real field access operator for modules, but that probably won't happen since nobody wants to write Package..foo.

Tab-completing after dots gets a bit ugly; technically you have to check what method x. might call to see if it's appropriate to list object field names or module names. And I hope nobody tries to define getfield(::Module, ...).

StefanKarpinski commented 10 years ago

I think that tab completing can be done like this: foo.<tab> lists the "public fields" and foo..<tab> lists the "private fields". For modules, would it be ok to just allow the default implementation of Mod.foo be Mod..foo and just tell people not to add getfield methods to Module? I mean, you can already redefine integer addition in the language – all hell breaks loose and you get a segfault but we don't try to prevent it. This can't be worse than that, can it?

JeffBezanson commented 10 years ago

It is in fact slightly worse than that, because a programming language really only cares about naming. Resolving names is much more important than adding integers.

We don't have much choice but to have Mod.foo default to Mod..foo, but we'll probably have to use Mod..foo for bootstrapping in some places. The .. operator is extremely helpful here, since without it you can't even call Core.getfield in order to define the fallback. With it, we'd probably just remove Core.getfield and only have ...

StefanKarpinski commented 10 years ago

That's a fair point – naming is kind of a big deal in programming :-). Seems like a good way to go – only .. and no Core.getfield.

cdsousa commented 10 years ago

This two ideas,

[...] put dot overloading in a later section of the manual, and recommend its use only in a few carefully selected cases @stevengj https://github.com/JuliaLang/julia/issues/1974#issuecomment-38847340

and

[...] the preference should be to use x.property syntax as much as possible @StefanKarpinski https://github.com/JuliaLang/julia/issues/1974#issuecomment-38694885

are clearly opposed.

I think that if the first idea is to be chosen then just creating a new .. operator for those "carefully selected cases" makes more sense. As advantage, using ..name for cases where currently [:name] is used (DataFrames, Dict{Symbol, ...}) would be more typing/syntax friendly while clearly stating that something different from field access was happening. Moreover, the double dot in ..name could be seen as a rotated colon, a hint to the symbol syntax :name, and also there would be no problem with tab completions. As disadvantage, the uses in PyCall et al. would be not so close to the original syntaxes (and could even be confusing for the cases when the . really must be used). But let's be honest, Julia will never be fully Python syntax compatible, and there will always be cases where one has to type a lot in Julia with PyCall to perform otherwise simple instructions in Python. The .. to emulate . could give a good balance here. (Please don't get me wrong, I really like PyCall and think it is a critical feature which deserves special care)

The second ideia, which I currently prefer, has the big decision about when property(x) or x.property must be used, which requires an elegant, well though, and clear definition, if such thing exists... It seems that if people want an overloadable . that's because they prefer x.property API style in the first place though. Anyway, I would prefer to see . not as a overloadable field access operator but as a overloadable "property" access operator (getprop(a, Field{:foo}) maybe?) which defaults to a non-overloadable field operator ... Other decisions would also have to be taken, e.g., which will be used in concrete implementation code for field access, .. or .? For example, for the Ranges step example, which will be idiomatic? step(r::Range1) = one(r..start) or step(r::Range1) = one(r.start)? (not to mention the question whether step must be a method or a property).

StefanKarpinski commented 10 years ago

That's why I backed off of that angle and proposed these criteria: https://github.com/JuliaLang/julia/issues/1974#issuecomment-38812139.

willowless commented 10 years ago

Just one thought that popped in to my head while reading this interesting thread. Export could be used to declare public fields, while all fields are visible inside the defining module, eg:

module Foo
   type Person
     name
     age
   end
   export Person, Person.name
   @property Person :age(person) = person..age + 1
end

In this situation the exported Person still looks like 'name' and 'age' except in this case age is readonly through a function that adds one. Exporting all of Person might be done as export Person.* or similar.

[pao: quotes]

pao commented 10 years ago

@emeseles Please be careful to use backticks to quote things that are like Julia code--this ensures formatting is maintained, and prevents Julia's macros from creating GitHub notifications for similarly-named users.

diegozea commented 9 years ago

. and .. are confusing: a clear and easy to remember sintax is something good