carbon-language / carbon-lang

Carbon Language's main repository: documents, design, implementation, and related tools. (NOTE: Carbon Language is experimental; see README)
http://docs.carbon-lang.dev/
Other
32.25k stars 1.48k forks source link

Method syntax #494

Closed josh11b closed 3 years ago

josh11b commented 3 years ago

Assuming a function syntax like:

fn IsEven(Int: n) -> Bool;

(The introducer fn is the subject of #463 .) What should the syntax for methods be? Methods use a different calling syntax (x.F(n)) and need to distinguish between taking the receiver (x) by value or by pointer.

There are 5 things to decide:

-1- -2- -3- -4- -2- -5- IsEven() -> Bool;

And here are our options:

-1- the introducer: A. The same introducer as functions (fn, func, or function as determined by #463) B. A new introducer, like method.

-2- brackets around the receiver declaration: C. (...) Parenthesis, suggesting a parameter list D. [...] Square brackets, suggesting an implied or different kind of parameter E. Omitted

-3- receiver type: F. Self or pointer to Self (however we decide to write that, Self* or Ptr(Self)) G. Something shorter like Me or pointer to Me (however we decide to write that, Me* or Ptr(Me)) H. Omitted

-4- receiver name: I. An identifier specifying the name to bind the receiver to J. Omitted, the receiver will use some reserved word

-5-: K. A dot . L. A dot . if the receiver is passed by value, an arrow -> if the receiver is passed by pointer M. Omitted

Examples: BCFIM: method (Self* this) Set(Int n); ADGJK: fn [Me*].Set(Int n); AEHJL: fn ->Set(Int n);

Note: One of the options F, G, or L is needed to disambiguate calling by value vs. pointer.

Alternative: Following C++'s "Deducing this" proposal, we could also consider marking the first parameter with a keyword to indicate it is the receiver. That proposal uses this, as in:

fn Set(this Self* self, Int n);

(This alternative was suggested by @tkoeppe .)

tkoeppe commented 3 years ago

Re 3, 4: It might also be worth considering whether the receiver can be deduced, like in C++'s "Deducing this" proposal.

josh11b commented 3 years ago

Do you mean this proposal? I will look at it.

tkoeppe commented 3 years ago

Precisely. It's been pretty universally popular. It allows elegant mixins and CRTP, and recursive lambdas.

chandlerc commented 3 years ago

Precisely. It's been pretty universally popular. It allows elegant mixins and CRTP, and recursive lambdas.

I think this shows some utility to planning for an entire type, and not just part of a type.

But I hesitate to generalize too far here -- IMO a number of this proposals advantages are somewhat specific to solving problems created by C++, unsure Carbon will end up benefitting to the same degree.

Still, definitely a good thing to consider ahead of time rather than after-the-fact.

chandlerc commented 3 years ago

I would actually like to suggest a higher level concern I have with the method syntax.

I would find all of these much more appealing if we could move the method name prior to the receiver. Specifically, I think we should strive to have an especially easily skimmable structure for APIs:

struct IntContainer {
  // Non-methods for building instances
  fn MakeFromInts ... -> IntContainer;
  fn MakeRepeating ... -> IntContainer;

  // Methods
  fn Size ... -> Int;
  fn First ... -> Int;

  fn Clear ...;
  fn Append ...;
}

Regardless of which keyword (although I suggest using fn for both), I really like the next thing being the name of the thing to ease scanning. Having to skip over a receiver (using any of the above syntaxes) for me really lessens the readability of the API.

What do others think?

If we want to consider that, what syntactic approaches should we consider? The syntax from "Deducing this" works this way, but I'm interested in whether there are other options. In particular, I continue to find that the receiver being in the "pattern match" part of the signature doesn't fit well, and I somewhat prefer it being written separately if possible. However, I wonder if my opinion on that would change if we wrote it as part of the implicit list? From the original example in the top comment:

fn Set[this Self* self](Int n);

Curious about thoughts on this approach as well.

fowles commented 3 years ago

Chandler~

I really like fn Name as the invariant. Makes scanning much nicer.

Matt

On Fri, Apr 23, 2021 at 5:51 PM Chandler Carruth @.***> wrote:

I would actually like to suggest a higher level concern I have with the method syntax.

I would find all of these much more appealing if we could move the method name prior to the receiver. Specifically, I think we should strive to have an especially easily skimmable structure for APIs:

struct IntContainer { // Non-methods for building instances fn MakeFromInts ... -> IntContainer; fn MakeRepeating ... -> IntContainer;

// Methods fn Size ... -> Int; fn First ... -> Int;

fn Clear ...; fn Append ...; }

Regardless of which keyword (although I suggest using fn for both), I really like the next thing being the name of the thing to ease scanning. Having to skip over a receiver (using any of the above syntaxes) for me really lessens the readability of the API.

What do others think?

If we want to consider that, what syntactic approaches should we consider? The syntax from "Deducing this" works this way, but I'm interested in whether there are other options. In particular, I continue to find that the receiver being in the "pattern match" part of the signature doesn't fit well, and I somewhat prefer it being written separately if possible. However, I wonder if my opinion on that would change if we wrote it as part of the implicit list? From the original example in the top comment:

fn Set[this Self* self](Int n);

Curious about thoughts on this approach as well.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/carbon-language/carbon-lang/issues/494#issuecomment-825943350, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAALOCUIXW3HI6DVAVR45QTTKHTWTANCNFSM43OZ7VJA .

jonmeow commented 3 years ago

How is the distinction between method and non-method functions intended to be indicated? IIUC the C++ equivalent is static: is it worth considering how the syntax for instance and static variables aligns?

chandlerc commented 3 years ago

How is the distinction between method and non-method functions intended to be indicated?

The thing that makes the most sense to me is to use the syntax that specifies the receiver's type.

Put differently, I'm suggesting that (3) in the original summary should not be omitted (at a minimum), and that whatever syntax we use for this is sufficient to indicate a method vs. a non-method.

We could alternatively rely on a separate bit of syntax like that in (5), but I prefer making the receiver type explicit and simply attempting to reduce the verbosity of that type.

IIUC the C++ equivalent is static: is it worth considering how the syntax for instance and static variables aligns?

I'm not sure. Maybe at least a little.

My thoughts above about trying to pull out the idea of names immediately following the introducer is because types with reasonably large public APIs would have a pretty large list of these with the reader frequently skimming to find a specific name in the API.

I find that these kinds of public APIs somewhat rarely have instance members as direct parts of the API. Certainly, many style guides advocate against it. Places where instance members are part of the API tend to be fairly small and/or local types that are focused on storage more than a rich API. So I don't see it as critical to align instance variable syntax structures with functions, or to try to get them to have a single name immediately after an introducer.

A slightly more interesting case would be constants, which do form parts of public APIs more often. But at least in my experience, I've not found them to be so numerous that this kind of skimming structure seems essential.

However, this does raise one thing we may want to think about either now or in the future: do we want to have something more like Swift's computed properties which essentially allow the data member syntax to be used while having an actual API with getter and setter logic. If we want to add such constructs, then those would seem more likely to be valuable to harmonize with the function declaration syntax to ensure we can form easily skimmed and read APIs.

josh11b commented 3 years ago

Interfaces will most commonly have methods, but they also support static functions and associated types. Right now, associated types are spelled: var Interface$ name. Since different associated types will have different bounds, the names will not end up aligned.

geoffromer commented 3 years ago

In particular, I continue to find that the receiver being in the "pattern match" part of the signature doesn't fit well, and I somewhat prefer it being written separately if possible. However, I wonder if my opinion on that would change if we wrote it as part of the implicit list? From the original example in the top comment:

fn Set[this Self* self](Int n);

Curious about thoughts on this approach as well.

I think that could wind up being confusing, because the square brackets seem very much part of pattern matching, even though they're not part of the pattern syntax per se. In see a "conventional" usage like fn Foo[Type$$ T](T arg1, T arg2) as syntactic sugar for fn Foo((Type$$ T1) arg1, (Type$$ T2) arg2) where T1 == T2. In other words, the square brackets are a way of factoring out a named, reusable subpattern. Your mooted extension doesn't fit into that model.

josh11b commented 3 years ago

Here are a list of options that conform to @chandlerc 's constraint that the method names line up, or at least mostly line up:

Proposal "AEHJL" has the property that the names all start in columns 5, 6, or 7, depending on if it is static, accessor, or mutator, which might be close enough since it is still pretty easy to visually read the names off:

struct S {
  fn Make(Int n) -> S;
  fn .Access() -> Int;
  fn ->Set(Int n);
}

Option "B" could be adopted as long as the new introducer was ~2 characters long, and the actual receiver type being specified as the first parameter type.

// B1: omitting parameter name
struct S {
  fn Make(Int n) -> S;
  me Access(S) -> Int;
  me Set(S*, Int n);
}

// B2: including parameter name
struct S {
  fn Make(Int n) -> S;
  me Access(S this) -> Int;
  me Set(S* this, Int n);
}

We have a couple variations on marking this first parameter with a keyword as in C++'s "Deducing this" proposal.

// N1: omitting parameter name
struct S {
  fn Make(Int n) -> S;
  fn Access(this S) -> Int;
  fn Set(this S*, Int n);
}

// N2: including parameter name
struct S {
  fn Make(Int n) -> S;
  fn Access(this S self) -> Int;
  fn Set(this S* self, Int n);
}

Then there is @chandlerc 's suggestion of moving that into the implicit parameters, which @geoffromer has raised concerns about, but does make the declaration look more similar to the call.

// O1: omitting parameter name
struct S {
  fn Make(Int n) -> S;
  fn Access[this S]() -> Int;
  fn Set[this S*](Int n);
}

// O2: including parameter name
struct S {
  fn Make(Int n) -> S;
  fn Access[this S self]() -> Int;
  fn Set[this S* self](Int n);
}

We could also use a delimiter in the parameter (are implicit parameter list), but this might be a bit subtle:

// P1: omitting parameter name
struct S {
  fn Make(Int n) -> S;
  me Access(S;) -> Int;
  me Set(S*; Int n);
}

// P2: including parameter name
struct S {
  fn Make(Int n) -> S;
  me Access(S self;) -> Int;
  me Set(S* self; Int n);
}

Other ideas to consider? Are any of the options more appealing than the others?

geoffromer commented 3 years ago

Let me take a step back for a minute:

Methods use a different calling syntax (x.F(n)) and need to distinguish between taking the receiver (x) by value or by pointer.

Do we expect there to be other differences between methods and other functions, besides the difference in declaration and call syntaxes? In particular:

josh11b commented 3 years ago

Let me take a step back for a minute:

Methods use a different calling syntax (x.F(n)) and need to distinguish between taking the receiver (x) by value or by pointer.

Do we expect there to be other differences between methods and other functions, besides the difference in declaration and call syntaxes? In particular:

  • Does name lookup work differently for method calls? E.g. do methods have to be declared in the body of the receiver type?

I would like the answer to be yes.

  • Do methods have privileged access to the members of the receiver type?

My guess would be yes.

  • Do generics treat methods differently from other functions?

I currently have interfaces use different declarations for methods vs. other functions, but that is primarily because they are expected to be treated differently as the result of the answers to these other questions.

  • Do non-method functions support dynamic dispatch?

My guess would be no, except through generics.

  • Can a callsite be agnostic about whether a method takes its receiver parameter by pointer or by value?

My understanding is that we were planning to make the answer to this question be "yes."

There is another difference: the receiver parameter to a method varies covariantly in inheritance unlike other parameter types. This is arguably part of the fact that methods are doing dynamic dispatch on the receiver's type.

josh11b commented 3 years ago

Option Q: other symbols to indicate passing this by pointer vs. by value

struct S {
  fn Make(Int n) -> S;
  fn= Access() -> Int;
  fn& Set(Int n);
}

Option R:

struct S {
  fn Make(Int n) -> S;
  fn .Access(Self s) -> Int;
  fn &Set(Self* s, Int n);
}
josh11b commented 3 years ago

In the open discussion slot, we did talk about how different it was that we might automatically take the address of the receiver when passing it into the method.

One alternative is that you actually pass pointers to mutating methods:

var S x = S.Make(3);
(&x).Set(4);

Question: This business of the type of the receiver affecting whether we take its address sounds a lot like one of the things references do. Do we want to possibly add references? It would also help in other places such as custom match, and the return type of operator [].

If we don't want to go that far, perhaps there is a way when binding a name in a pattern indicate that we implicitly take the address. Possible syntax:

fn F(*(Int* p)) {
  *p = 3;
}
fn G(Int* p) {
  *p = 3;
}
var Int x = 2;
F(x);  // equivalent to G(&x);
Assert(x == 3);

We would then use that syntax on the this parameter. Might need something similar for the return of operator [].

struct S {
  var Int: x;
  method *(Self* this) Mutate() -> *(Int*) {
    return &this->x;
  }
}
var S y = (x: 3);
y.Mutate() = 4;
josh11b commented 3 years ago

Option S: Use different introducers for the different cases. For example, ro == read-only, rw == read-write. Shown here with no this parameter declaration, but other variations are of course possible.

struct S {
  fn Make(Int n) -> S;
  ro Access() -> Int;
  rw Set(Int n);
}
zygoloid commented 3 years ago

I think ACFIL with AEHJL as optional syntactic sugar would be an interesting choice. That is, the full method syntax is:

fn (Foo self).Function(Param p) -> Result { ... }
fn (Foo* self)->Function(Param p) -> Result { ... }

... but in contexts where Self is in scope, (Self self). and (Self* self)-> can be written as simply . and ->:

struct Foo {
  fn .Function(Param p) -> Result;
  fn ->Function(Param p) -> Result;
}

Within a class definition, you'd usually be able to use the shorthand, and you get an easily scannable list of methods. Outside a class definition, we can work out which class we're defining a member of based on the receiver type, so the syntax is not much more verbose than a declaration without a receiver type (fn Foo.Function(Param p) -> Result { ... }). I think we would want to permit

fn Foo.(Foo self).Function(Param p) -> Result { ... }

... in case people want to be fully explicit or want to use a receiver type other than Self, but it should be unnecessary in common cases.

My main hesitation here and with AEHJL in general is using the same identifier self as sometimes a value of type Self and sometimes a pointer to Self. I'd be much happier with this approach if it were more reference-like.

geoffromer commented 3 years ago

I find option L to be quite problematic, for two main reasons:

First, the distinction between fn .Foo() and fn ->Foo() in the method declaration is intensely suggestive of the distinction between x.Foo() and x->Foo() at the method's callsites -- and yet the two have nothing whatsoever to do with each other. The use of . or -> in the declaration of Foo has no effect on whether a call to Foo should use . or ->. Instead, it specifies whether the body of Foo should use . or -> when accessing other members.

Second, the primary purpose of option L seems to be to enable callsites to be agnostic about whether a parameter is passed by value or by pointer. I think this is an important problem that is well worth solving, but option L solves it only for the special case of self parameters, and I don't see how we could extend or generalize it to cover other function parameters, either now or in the future.

chandlerc commented 3 years ago

There has been a lot of discussion in open sessions here, and I think there is a certain amount of consensus emerging:

I think the above have growing alignment, but if anyone disagrees, chime in.

Beyond this, I think there are two big questions we keep circling around. First: where does the parameter go after the name? I think there are really three clear options here:

1) Inside the ()s: fn Method[T:$ Container](me: ..., data: T) 2) Inside the []s with deduced parameters (although we might consider either []s or <>s): fn Method[me: ..., T:$ Container](data: T) 3) In their own section to clearly distinguish between implicit parameters vs. deduced parameters. All the ideas I have here end up needing a third entry in the list either with all three delimeter syntaxes (fn Method<T:$ Container>[me: ...](data: T)) or repeating one of them (fn Method[T:$ Container](me: ...)(data: T)).

I suspect choosing between these three is something the leads will need to do, likely in conjunction with #565.

The second remaining question (with my examples using (2) above just because that's what I've been using most recently): how do we handle different ways in which the object parameter might be passed? We've been currently orienting around immutable values (which can't even have their address taken) and pointers, with potential to expand later. There seem to be two options here:

I lean towards (a), and potentially adding some support for a pattern to bind after a dereference but not removing the fundamental fact that the parameter is a pointer and the . takes the address implicitly.

However, @zygoloid has argued for (b) because it seems awkward for the only thing to select between implicitly taking the address or not be whether the object parameter of the method looked up is a pointer (let me know if I've gotten this wrong). For example, if we want to allow deduction of the object parameter type, this deduction can't be used to select between pointer and not pointer. While I'm somewhat nervous about (b), and I actually rather like (a) in several respects, I do understand the concern here.

Ultimately, I think either (a) or (b) would be fine, and we should really pick one sooner for now. Maybe (b) is the least bad option.

The other options here I think largely go down the path of putting (nearly) the full object category into the type system with something like references, which is complexity that I'd very much like to avoid at this stage and so that's why I'm leaning towards either (a) or (b).

geoffromer commented 3 years ago

The problem that we are trying to solve here is that we want to be able to pass me by reference.

We could quibble about whether me is implicitly dereferenced in the body of the function, but that's a minor matter of syntactic convenience for the function author (and the fact that (b) differs from (a) on this point seems like a distraction). The thing that makes pass-by-reference a meaningful language feature, and not mere syntactic sugar, is the fact that it permits the callee to obtain access to the argument object (i.e. its address), rather than only the argument's value, without affecting the callsite syntax. And that's exactly the goal of both (a) and (b).

That being the case, I think the key distinction between (a) and (b) is that (b) can be plausibly generalized to arbitrary function parameters, whereas generalizing (a) in the same way would be highly problematic (to put it mildly). And it would be very poor ergonomics to have some way of opting into pass-by-reference for other parameters that was different from the way of opting into pass-by-reference for Self, so I think (a) is preferable to (b) only to the extent that we're confident that this is the last time we'll discover we need pass-by-reference in Carbon. Personally, I'm not at all confident of that, and in fact I'm moderately confident of the opposite.

The other options here I think largely go down the path of putting (nearly) the full object category into the type system with something like references, which is complexity that I'd very much like to avoid at this stage and so that's why I'm leaning towards either (a) or (b).

It seems to me that (a) is very much going down the path of putting the full object category into the type system: it is using a parameter's type to signify whether the function has access to the corresponding argument object, or only the argument's value (and that's precisely why generalizing it would be so problematic). It may only be taking one step down that path, because it's limited to Self parameters, but it seems to me that if we want to avoid that path (and I agree with you that we do), it would be better to avoid even that first step. In other words, given that we want to be able to specify the object category of certain parameters, but we want to keep object category out of the type system, we should probably specify the object category using something other than the parameter type -- such as the & in option (b).

As a syntactic side note, we seem to be moving toward a convention where the nature of a variable binding is determined by an initial keyword, as with var and let, and the template/generic/dynamic keywords proposed in #565. On that basis, I somewhat lean toward ref rather than & as the syntactic marker for by-reference binding in option (b).

zygoloid commented 3 years ago

It seems desirable to me that the object parameter uses the same passing and matching rules as any other parameter, in both the call site and the function declaration, to the extent possible: a call a.F() and function fn F[me: T]() should behave the same as a call F(a) and function fn F(x: T) (where me: T / x: T could be any pattern we allow in that context).

Option (a) addresses this in the function definition: if the way we write functions that mutate caller-owned state is by accessing that state with a pointer in the function definition, then that's how we should mutate a caller-owned object parameter.

Option (b) addresses this in the interaction between the caller and callee: if we want to be able to write a method such that a.F() captures a mutable handle to a, then we should have explicit decoration for that, rather than inferring that from the fact that we're matching a pointer, because we don't want F(a) to implicitly pass &a when F is declared as fn F(p: T*).

A hybrid option (c) could give us both of these at once: we could reflect exactly how the parameter is passed (the me in the method is a pointer if we're passing a pointer and a value otherwise), and carry a separate annotation that indicates that the binding first takes the address of the argument before matching it:

struct Circle {
  var center: Point;
  var radius: Float;

  fn Diameter[me: Self]() -> Float {
    return me.radius * 2;
  }

  // The pattern 'addr patt' means "first take the address of the argument,
  // which must be an lvalue, and then match patt against it".
  fn Expand[addr me: Self*](distance: Float) {
    me->radius += distance;
  }
}

// The same test works as above.
fn Test(original: Circle, mutable: Circle*) {
  var local: Circle;

  // All of these are fine, the last requires an `->`.
  original.Diameter();
  local.Diameter();
  mutable->Diameter();

  // Both of these are fine, only the last requires an `->`.
  local.Expand(2.0);
  mutable->Expand(2.0);

  // This would be a compile error.
  original.Expand(2.0);
}

Having said that, I think it would make sense to add dedicated keywords for the various forms of argument passing semantics that we want (eg, inout, out, by-move, ...), and if we do that well, perhaps we will not actually want to support passing the object parameter as a pointer at all. We might want call-site syntax for indicating pass by reference; if we take the desire from the start of this comment as axiomatic, that means we'll want that syntax for the object parameter too:

fn F(ref x: Int);
struct X {
  fn G[ref me: Self]();
}
fn Test() {
  var a: Int;
  var b: X;
  // If `ref` is needed here...
  F(ref a);
  // ... then presumably `ref` should be needed here.
  ref b.G();
}
chandlerc commented 3 years ago

A hybrid option (c) could give us both of these at once: we could reflect exactly how the parameter is passed (the me in the method is a pointer if we're passing a pointer and a value otherwise), and carry a separate annotation that indicates that the binding first takes the address of the argument before matching it:

I actually really like this. I had similar thoughts, but kept ending up in a less good place. Your example here:

  // The pattern 'addr patt' means "first take the address of the argument,
  // which must be an lvalue, and then match patt against it".
  fn Expand[addr me: Self*](distance: Float) {
    me->radius += distance;
  }

I'm actually pretty happy with this. Especially the simplification that the addr keyword means x.Foo will take the address of x before attempting to pattern match anything, and then you can write the pattern however you want. But it is going to see &x in that position and better be able to match that works quite nicely.

And @geoffromer is right -- this really is encoding the value category into the type system really explicitly -- L-values in the type system are pointers and somewhat have to be here.

I still think its worth seeing how far we can get with the syntactic difference here and just taking the address of the impliict object parameter when the method clearly asks us to.

geoffromer commented 3 years ago

I think the only major reservation I have about this approach is that in order to figure out the argument type that a function expects, it's not sufficient to look at the parameter type -- you also have to look at the parameter's introducer. However, I think that problem won't really come into focus until we allow addr for non-self parameters. On the other hand, one striking corollary of applying this approach to non-self parameters is that I think & wouldn't need to be a primitive operation anymore:

fn operator&[T: Type](addr operand: T*) -> T* { return operand; }

That somehow makes me feel like we're on the right track. So this approach seems fine for present purposes (i.e. the self-only case), and plausible if not outright intriguing as a basis for future generalization.

And @geoffromer is right -- this really is encoding the value category into the type system really explicitly -- L-values in the type system are pointers and somewhat have to be here.

I don't think we're really "encoding the value category into the type system" here, we're just saying that the address-of operation only works on lvalues, and the dereference operation always produces lvalues, both of which we basically already knew. In any event, the key point is that we're not using the type system to express the requirement that the argument expression is an lvalue (or the fact that its address is taken implicitly).

chandlerc commented 3 years ago

I think with the resolution of #565, this question has also converged on @zygoloid's suggestion:

struct Circle {
  var center: Point;
  var radius: Float;

  fn Diameter[me: Self]() -> Float {
    return me.radius * 2;
  }

  // The pattern 'addr patt' means "first take the address of the argument,
  // which must be an lvalue, and then match patt against it".
  fn Expand[addr me: Self*](distance: Float) {
    me->radius += distance;
  }
}

// The same test works as above.
fn Test(original: Circle, mutable: Circle*) {
  var local: Circle;

  // All of these are fine, the last requires an `->`.
  original.Diameter();
  local.Diameter();
  mutable->Diameter();

  // Both of these are fine, only the last requires an `->`.
  local.Expand(2.0);
  mutable->Expand(2.0);

  // This would be a compile error.
  original.Expand(2.0);
}

We can (and should) still stay open to exploring other argument passing paradigms including things like inout, and if we adopt such a pattern, we should also consider how to use it for the receiver as well. But I don't think we need to wait for that, and I think having some mechanism that works with the low-level pointer model seems important.

I'm going to close this out for now as I think this has pretty clearly settled amongst the leads at this point, but happy to re-open if needed or problems arise. =]

I'm also incorporating this into a proposal already, if the structs proposal doesn't get there first.