chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 419 forks source link

tertiary initializers #17225

Open mppf opened 3 years ago

mppf commented 3 years ago

(This issue is a spin-off from issue #16732).

Should Chapel allow one to create an init or init= in a different module from the one defining the type? If so, are there special constraints or features required for such initializers? Additionally, should it be possible to create custom init= functions for non-record non-class types (e.g. int) ?

One issue is that in a tertiary initializer, it will not necessarily be possible to initialize all of the fields because some of them might be private (private fields are discussed in #6067). This issue is also present when creating an init= for a built-in type such as int. How might we write a tertiary initializer without naming the fields?

For example, let's consider how one might be able to create an init= to support things like var x: int = mybigint;

// approach 1
proc int.init=(rhs: bigint) {
  var i = rhs.bigintToInt();
  this = i; // not yet legal under initializer rules
}
// approach 2
proc int.init=(rhs: bigint) {
  var i = rhs.bigintToInt();
  this.init(i); // requires `proc int.init(from: int)` which we wouldn't normally add
}
// approach 3
proc int.init=(rhs: bigint) {
  var i = rhs.bigintToInt();
  this.init=(i); // currently a syntax error
}
mppf commented 3 years ago

Over in https://github.com/chapel-lang/chapel/issues/16732#issuecomment-731310236 @bradcray said this:

Of your three approaches for writing a tertiary initializer on int, I think approaches 1 and 2 seem reasonable. Specifically, given that we currently support this as a way of assigning to the scalar's value for other tertiary methods, such as:

proc ref int.square {
  this *= this;
}

...approach 1 seems symmetric to me (though it also arguably raises questions with the existing approach for scalars, like "could a user write a type in which this similarly referred to the type's value wholesale in some way?" But note that that question exists even if we didn't enable approach 1 given the current tertiary method support on scalars).

Approach 2 also seems reasonable to me given that, over the years, we've discussed wanting to support new int(42) (and similarly for other built-in scalar value types) in order to support generic programming patterns like:

proc foo(type t) {
  return new t(42);
}

where today t must be a user-defined record or class type, but it could be attractive to permit it to be int or uint, say.

bradcray commented 3 years ago

Should Chapel allow one to create an init or init= in a different module from the one defining the type?

I generally feel inclined to support them for orthogonality, unless someone comes up with an argument I haven't heard for why it's an inherently bad idea.

Additionally, should it be possible to create custom init= functions for non-record non-class types (e.g. int) ?

I think so, as your excerpt from me in the previous comment indicates

One issue is that in a tertiary initializer, it will not necessarily be possible to initialize all of the fields because some of them might be private (private fields are discussed in #6067).

While there are some comments on this issue about what the implication of private means on a field, that feels to me like something that hasn't received a lot of discussion or attention (i.e., that's arguably undecided).

That said, if one can forward to another primary/secondary initializer on the type from a tertiary initializer, it may not be necessary to do so, without loss of generality (assuming the type has sufficiently rich initializers to begin with)?

mppf commented 3 years ago

it may not be necessary to do so

I think you are saying - it may not be necessary to settle what private means on a field? I don't think we need to know that in order to make progress on this issue. (But I wouldn't object to figuring out private fields separately first anyway because they have been a long time coming). In particular, I can't think of a private design for fields that is both useful for encapsulation/information hinding and also allows one to initialize all the fields in a tertiary initializer. So I think the issue for tertiary initializers will be present in any private field design.

I'm not sure, but I think we are agreeing that we can have a tertiary initializer story that allows one to write an initializer by using other initializers/= and without naming any fields and that would help with private fields in whatever form they end up.

lydia-duncan commented 3 years ago

The main thing that worries me about tertiary initializers is their interactions with default initializers - if the type had not defined any primary or secondary initializers, it will generate a default initializer and it can be confusing for a default initializer to be generated when there are other initializers available.

I also worry about the interaction with how we declare a type for a particular instance and with new. There may be other cases in the implementation to watch out for that rely on an initializer and assume it is defined in the same scope as the type or where it expects to be generated. (Edit: Not saying this is an insurmountable issue)

mppf commented 3 years ago

The main thing that worries me about tertiary initializers is their interactions with default initializers - if the type had not defined any primary or secondary initializers, it will generate a default initializer and it can be confusing for a default initializer to be generated when there are other initializers available.

IMO if the type author has not created any initializers in their module implementing the type, then their public API includes the compiler-generated one. As a result a tertiary initializer would invoke that compiler-generated initializer. I think we'd have to be careful with visibility (the tertiary initializer shouldn't hide the compiler-generated one in that case) but other than that don't see a particular problem here.

I'm not quite sure what the issue with new you are describing is but it sounds like an implementation issue (rather than a language design level one). If I am wrong about that maybe you could show an example?

mppf commented 3 years ago

Would the strategy I'm describing be a technically breaking change, if today, a tertiary initializer disables the generation of the compiler-generated-default one? (Is that the case today?)

bradcray commented 3 years ago

it may not be necessary to do so

I think you are saying - it may not be necessary to settle what private means on a field?

No, sorry, I was actually trying to say "it may not be necessary to be able to access the private fields if I could call other initializers," similar to what you said here:

think we are agreeing that we can have a tertiary initializer story that allows one to write an initializer by using other initializers/= and without naming any fields and that would help with private fields in whatever form they end up.

bradcray commented 3 years ago

I think Lydia's point is a good one:

The main thing that worries me about tertiary initializers is their interactions with default initializers

In that, today, the presence of a user-defined initializer shadows the compiler-generated initializer. So I think Michael's point here:

think we'd have to be careful with visibility (the tertiary initializer shouldn't hide the compiler-generated one in that case)

Is arguably problematic in that it makes the rules different for tertiary initializers vs. others.

One potential solution is not to permit tertiary initializers in the event that the type's module doesn't define any primary/secondary user-defined ones (i.e., if the type relies on the compiler-generated initializer, so must other modules).

lydia-duncan commented 3 years ago

Would the strategy I'm describing be a technically breaking change, if today, a tertiary initializer disables the generation of the compiler-generated-default one? (Is that the case today?)

Yes, it would break any code that relied on the default one that happened to also be included in the program, including code from other modules. We could maybe do something where modules that use the module with the tertiary method can't call it but that seems real scary to me

mppf commented 3 years ago

I wanted to understand what the current situation is so I created this test case. The result wasn't quite what I expected.

module DefinesType {
  record MyInteger {
    var x: int;
    proc init() {
      writeln("In MyInteger.init (called only from compiler-generated default init for R)");
      this.x = 1;
      this.complete();
    }
    proc init(arg: int) {
      this.x = arg;
      this.complete();
    }
  }
  operator ==(lhs: MyInteger, rhs: MyInteger) {
    writeln("In MyInteger.== (called only from compiler-generated default == for R");
    return lhs.x == rhs.x;
  }

  record R {
    var x: MyInteger;
  }
  // expecting compiler-generated:
  //   R.init calling MyInteger.init
  //   == on R calling == on MyInteger
  proc test() {
    writeln("DefinesType.test");
    var a = new R();
    writeln(a);
    writeln(a == a);
  }
}

module DefinesInit {
  use DefinesType;
  proc R.init() {
    writeln("In DefinesInit.R.init");
    this.x = new MyInteger(2);
    this.complete();
  }
  // uncommenting this causes it to no longer compile
  // (fails to resolve the == call in DefinesType.test)
  /*
  operator ==(lhs: R, rhs: R) {
    writeln("In DefinesInit.R.==");
    return lhs.x.x == rhs.x.x;
  }
  */

  proc test() {
    writeln("DefinesInit.test");
    var a = new R();
    writeln(a);
    writeln(a == a);
  }
}

module Main {
  use DefinesType;
  use DefinesInit;
  proc main() {
    writeln("main");
    var a = new R();
    writeln(a);
    writeln(a == a);

    DefinesType.test();
    DefinesInit.test();
 }
}

The output is

main
In DefinesInit.R.init
(x = (x = 2))
true
DefinesType.test
In MyInteger.init (called only from compiler-generated default init for R)
(x = (x = 1))
true
DefinesInit.test
In DefinesInit.R.init
(x = (x = 2))
true

Meaning:

I don't think it's reasonable for the decision about creating a compiler-generated default function (of any sort) to depend on a whole-program-wide property. I think it is confusing to do so - adding a module to your program can break code that was working in other modules. Besides that it would present challenges for separate compilation.

So, my suggested remedy is, for all compiler generated functions:

Note 1: We can allow tertiary overloads of init/=/== etc if the compiler can figure out that the function applies only in a different situation from the compiler generated one. In particular we can use this rule to allow a 3rd module defining neither type A nor type B to create init=/=/== between these two types.

bradcray commented 3 years ago

I don't think it's reasonable for the decision about creating a compiler-generated default function (of any sort) to depend on a whole-program-wide property.

I agree with this.

  • the tertiary init didn't prevent the generation of the compiler-generated init; the tertiary init and the compiler generated one are applying according to visibility
  • the tertiary == did prevent the generation of the compiler-generated ==

What I'd like to see here is:

Here's my quick take on Michael's stricter proposal:

  • decide whether or not to generate it based upon only information in the module defining the type

I like this theme, though it seems as though this rule could potentially be relaxed to "based on information visible to the module defining the type." In part because it seems like a similarly simple definition to me ("Can I see it from here? No? Then I'm going to assume it doesn't exist.") while also potentially adding flexibility (e.g., "For some reason in my code organization, I want to define these core routines in a sub-module that symmetrically uses/is used by this module"). That said, I could see taking the stricter approach and keeping this in our back pocket until we needed it (and I think we'd need to as long as compiler-generated functions are built before resolution since the "can resolve?" style question I've proposed above couldn't easily be answered today... (or maybe it could be given the restricted type signatures we're talking about?)

  • consider disallowing tertiary overrides of these

I don't feel obviously on-board with this, though I understand it would be the conservative thing to do. Iterating through Michael's arguments:

because it should be up to the type author to define the basic functionality of their type (e.g. default init

For default init, if:

assignment

If the user is creating a t = t assignment overload, it seems like that would necessarily be an ambiguity due to conflicting with either the type author's overload (if they provided one) or the compiler's overload (if they didn't), so wouldn't be useful, but also not problematic. (unless the compiler-generated version was marked as "last resort", which might be reasonable, though I don't think we do currently(?), and I wouldn't be the one to propose it).

Meanwhile creating tertiary t = t2 overloads seems obviously useful / powerful, but since the compiler doesn't generate t = t2 overloads, shouldn't be a problem.

==, etc.

These seem similar to the previous case to me.

because we don't want other modules to create code that relies on the fact that the type uses a default initializer - since in the future the type author might write an initializer - and it would be surprising if "type uses default initializer" forms part of the API for a type

I think by "default initializer" here, you mean "the compiler-generated initializer" (rather than "the 0-argument initializer")? If so, I'm not sure I'm following this argument. If the type's author relies on the compiler-generated initializer at first, I've proposed that no other module be able to either, so no tertiary initializers would exist at first; then, if a user-level initializer was added later, this lack of tertiary initializers wouldn't cause problems.

If the type's authors did define initializers and overrode the compiler-generated one, tertiary initializers could be defined, and if the type author later removed the explicit initializers and relied on the compiler-generated one, it would cause those tertiary initializers to result in errors.

But I also don't see how this case is particularly different than the type author changing the signatures of their initializers (e.g., to require a new argument), which could also break existing code.

because we are worried about the visibility implications (e.g. in a generic function) being confusing

I don't currently feel terribly worried about this, particularly as operators become methods and constrained generics/interfaces come online. It feels like the number of cases that ran into traditional POI surprises will decrease significantly.

because to describe the behavior we would have to make "last resort" a user-facing concept

I'd actually like to do that anyway. I think the ability to define clear user-facing error messages via overloads without precluding someone else from creating their own overload to flesh out the capability has been really valuable to us, and as a library author think I'd appreciate having the same capability.

mppf commented 3 years ago

@bradcray - I think we're agreeing more than disagreeing here.

I don't think it's reasonable for the decision about creating a compiler-generated default function (of any sort) to depend on a whole-program-wide property.

I agree with this.

Maybe that is settled, then. We will see if anybody else objects.

decide whether or not to generate it based upon only information in the module defining the type

I like this theme, though it seems as though this rule could potentially be relaxed to "based on information visible to the module defining the type."

That would be fine with me but as you said we might not be able to do that easily in the current compiler.

Regarding

One potential solution is not to permit tertiary initializers in the event that the type's module doesn't define any primary/secondary user-defined ones (i.e., if the type relies on the compiler-generated initializer, so must other modules).

vs

consider disallowing tertiary overrides of these

I wasn't really trying to disagree with your proposal so much as think about extending it. I think that with my "Note 1" these ideas are not so different; we just need something that applies both to initializers and to things like ==.

What I'd like to see here is:

  • for the lack of a primary/secondary init (causing the compiler-generated init) to prevent the definition of a tertiary init
  • for the lack of a primary/secondary == to cause the compiler to generate a == operator which would then be ambiguous w.r.t. the tertiary one.

This would be fine with me, although it is not so obvious to me why init and things like == need to be different in this regard (prevent definition of tertiary one if the compiler generated it vs. expecting ambiguity between the compiler generated one and the tertiary one).

I think the main reason I can think of in favor of preventing the tertiary one if the compiler-generated one is used is this. Suppose we used the ambiguity strategy - if both a tertiary one and a compiler-generated on are present they can be ambiguous but otherwise they function. It might be surprising if the tertiary one is written in a module and that works for a while, but then the type author adds something that makes it ambiguous. I'm not sure how common this would be. Anyway it's not really specific to these compiler-generated cases; it's just that it might be less obvious to the type author that they are making a change that could cause it since the compiler-generated ones are not immediately in front of them. (I suppose we could have chpldoc start to generate docs for them to make their presence more obvious).

(Only tangentially related - but I have been wondering if it would make sense to be able to write things like x.SomeModule.someMethod() to indicate which module in the case of ambiguity between modules. This seems even harder though for things with syntactic support like x == y and new SomeType()).

AFAIK we mark all of the functions generated in buildDefaultFunctions with pragma "last resort". If we stopped doing that we should be able to get the ambiguity errors in these cases.

lydia-duncan commented 3 years ago

I suppose we could have chpldoc start to generate docs for them to make their presence more obvious

That's definitely something I'd like to do.

But I also think the danger of a type author defining one of these methods that conflicts with a tertiary one comes with the territory - tertiary initializers feel supplemental to me, so if their functionality (or completely different functionality with the same arguments) is added later, that's something that will need to be dealt with by the author of the tertiary initializer but not a major problem or something they would likely feel disappointed by (and may even be a sign that their definition was accepted by the type author into the library).