chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 420 forks source link

Should not-fully-defaulted generic record types require `(?)`? #21455

Open bradcray opened 1 year ago

bradcray commented 1 year ago

In discussing generic types recently and how to make them more visible within user code, a case that stuck out for me as being arguably a bad design choice was this case from the OP of Michael's PR #21410, which I'm going to expand upon a bit:

record GenericRecord {
 type indexType;
 type elementType;

  proc init(type indexType, type elementType) {
    this.indexType = indexType;
    this.elementType = elementType;
  }
}

var r: GenericRecord;            // OK, it is a generic type & not a partial instantiation

This issue asks: In the vein of making generic things more visible to the user, should the language require such cases to be written GenericRecord(?) rather than permitting them to be written as GenericRecord. This is arguably a re-statement of https://github.com/chapel-lang/chapel/issues/18214, which didn't have many supporters at the time, but maybe would now?

[edit: In proposing this, I was thinking about specifying types of variable declarations (standalone or fields) and potentially formal arguments]

mppf commented 1 year ago

This is the opposite of what I am used to, but I appreciate that it makes it more clear when a field is generic -- and when a class is generic -- and I am very interested in finding a way to solve that problem. (To get all of the way there from this proposal, we would need to find a way to explicitly mark generic management for a class & figure out what to do with things like integral & also handle something like var x: fnReturningGenericType()).

bradcray commented 1 year ago

This is the opposite of what I am used to

Can you say more about that? I'm trying to guess but failing.

mppf commented 1 year ago

I was just saying that I am used to the current behavior in the language, where I can write things like var r: GenericRecord = something() even though GenericRecord is a generic type. Another way to interpret that is "This seems like a change in the language that goes beyond corner cases / warnings".

mppf commented 1 year ago

I was thinking about this a little bit more and I want to bring up a few cases.

When would the proposal apply?

My understanding of this proposal is that it would apply when working with the name of a generic class/record (and not a type variable meaning that type) in the following contexts:

I think applying it in other cases has dubious value. As a result, I don't think that this proposal can be as completely uniform as you might be hoping; however it could be applied consistently to declared types of fields and variables.

Additionally, I would expect the proposal to apply, if we have a similar class, MyGenericClass, to declared field/variable/formal/return types indicating memory management style:

var x: MyGenericClass; // error/warning: need MyGenericClass(?)
var x: owned MyGenericClass; // error/warning: need owned MyGenericClass(?)

When do I think the proposal should not apply?

Type Variables, Formals, and Calls Returning a Generic Type

I am understanding this proposal to be specifically about writing GenericRecord(?) in cases where previously we would write GenericRecord. I do not think it is reasonable to apply this proposal to all type fields/variables/formals that contain a generic type.

In particular, doing so would prevent writing generic code that queries if a type is generic. For example:

proc computeTypeAttributes(type t) {
  if isGeneric(t) {
    // do something
  } else {
    // do something else
  }
}

In particular, I don't think it's reasonable to require one write isGeneric(t(?)) if t is generic, because, the point of writing this code is to query that.

Now, you might say, "but that's not a practical example". We have a related feature that is practical and occasionally used. An init= can respond to the declared type of the LHS being initialized. That works with this.type which can be generic or partially or fully instantiated. The init= needs to be able to query if individual fields are instantiated or not. I think it would be unreasonable to expect such an init= to write this.type(?) because it the same init= should work with var x: MyGenericRecord(?) = 5; and with something like var y: MyGenericRecord(int, real) = 6;.

Similarly to this.type(?), I don't think that you should have to write fnReturningGenericType()(?), because I don't think it's reasonable to expect generic code working with such functions/expressions to know that the result will or won't be generic.

new

I do not intepreret this proposal as affecting new statements. In particular, if we write

var x = new MyGenericRecord(1.0, "hello");

(supposing a suitable proc init exists) , I don't see any benefit to writing

var x = new MyGenericRecord(?)(1.0, "hello");

Are we on the same page about this?

Type Method Receiver

What if we have a type method available on the generic type? Do we require that MyGenericRecord(?) be written to call it, as in MyGenericRecord(?).getNumericType() ? This does not seem appealing to me.

proc type MyGenericRecord.getNumericType() type { return int; }
MyGenericRecord.getNumericType(); // is this still OK?
MyGenericRecord(?).getNumericType(); // this is not appealing to me

Secondary Methods

If we have record GenericRecord defined as in the original post (really, any record that is not fully generic-with-defaults would do for the discussion below), then what if we are defining a secondary method?

proc GenericRecord.method() { } // Is this OK, even though GenericRecord is generic?

// Or do you have to write
proc GenericRecord(?).method() { }
// (which would require parser adjustments)

// Or do you have write
proc (GenericRecord(?)).method() { }
// (which should work today but seems verbose)

It seems that requiring it would be more consistent with other formals, but it feels very cumbersome to me.

Concern about it being clear if a type is generic

I am concerned that the current language design does not make it clear enough when a record/class is generic (#19120). The problem comes up in particular for field declarations like var field: <expr> where <expr> evaluates to a generic type. The Draft PR / Proposal for warnings in #21404 is focused on the field case for this reason.

Now, I interpret part of the motivation for this proposal that, when combined with #21410, it covers most of the cases in an alternative way to #21404, while doing so in a more consistent manner across fields, variables, and formals.

However, it does not cover all of the cases that the warning approach #21404 handles. I'd like to bring up the cases it does not address.

Type Variable

type t = MyGenericRecord(?);
record R {
  var field: t;
}

Does the proposal here require var field: t(?); be written here? As I have explained above, if it does, that would necessarily be inconsistent with type formals. If it does not, it will not address the concern about it being clear if a type is generic.

Note that this could also apply to a type field instantiated with a generic type and later used as the declared type for a field.

Computed Type

record R {
  var field: fnReturningGenericType();
}
proc fnReturningGenericType() type { return MyGenericRecord; }

I don't think this proposal would affect this code. As a result, it does not address the concern about it being clear if a type is generic. (Looking at var field: SomeFnCall(), I would expect field to be concrete. I don't want to have to rely on function names / understanding the contents of the function).

My Conclusion

  1. I think this proposal could make some cases more consistent between different syntactical forms as compared to #21404. However, I don't think it's completely consistent, so I view it more as drawing the line in a different place. It is not so clear to me that drawing the line in this different place is justified by user experience (fields making a type unexpectedly generic is a pain point we know we have today; in contrast I do not know of such a case of "is-it-generic" confusion for a variable or formal -- but they could be cases I am simply unaware of).
  2. I don't think this proposal completely addresses the concern that the current language design does not make it clear enough when a record/class is generic. It would need to be combined with other warnings/errors in order to do so.
bradcray commented 1 year ago

A quick late-Friday comment:

Are we on the same page about this?

I think so.

Do your "would need (?)" and "shouldn't need (or be able to use) (?)" cases break down into:

If so, that seems like a pretty natural place to draw a line.

I do not know of such a case of "is-it-generic" confusion for a variable or formal

To me, it seems similarly attractive to be able to look at this routine:

proc foo(r: R) { ... }

and determine whether it's taking a generic argument ("can I pass in any instance of R I have?") vs. a concrete type vs. a fully-defaulted version of a generic record R. Under this proposal, the latter two are the same, which I think of a benefit (fully-defaulted generics are essentially concrete), and the first would have to be rewritten as:

proc foo(r: R(?)) { ... }

The variable case is slightly less compelling, but arguably in a split-initialization context, it could be useful to distinguish between:

var x: R;
var y: R(?);

to know that x is amenable to being default initialized, but y cannot be and will necessarily need to rely on split initialization in order to determine its full type (of course, x may also be relying on split initialization and we wouldn't know, but that's not its fault; and at least its type can be known at the declaration point without looking forward for its initializing expression...

cassella commented 1 year ago

When would the proposal apply?

I can think of two more cases:

  1. A class specified in a class-inherit clause, e.g. class D : class C(?) { ...}. If C is generic it should be written with a (?). This seems pretty analogous to the declared field type case.

  2. Maybe in the class/record/union definition itself? [ edit: oh, I see #19120 now. ] If you intend to write a generic class, write

class C(?) { ... }

That gives the compiler (and readers of the code / documentation) a concise, unambiguous indicator of your intent. If you intend to write a concrete class but the compiler finds it's generic, it can give an error about that instead of down the road when you're trying to use the class as concrete. (Or vice versa, though that seems less common.)

(Alternately, generic class C {}, but the (?) notation is an existing practice.)

mppf commented 1 year ago

I do not know of such a case of "is-it-generic" confusion for a variable or formal

To me, it seems similarly attractive to be able to look at this routine:

proc foo(r: R) { ... }

and determine whether it's taking a generic argument ("can I pass in any instance of R I have?") vs. a concrete type vs. a fully-defaulted version of a generic record R. Under this proposal, the latter two are the same, which I think of a benefit (fully-defaulted generics are essentially concrete), and the first would have to be rewritten as:

proc foo(r: R(?)) { ... }

Sounds like you are trying to improve the situation here for:

That makes me think maybe we want it to apply to secondary methods?

proc GenericRecord.method() { }

If that is generic, it doesn't appear so, and so it would not be providing a "it is clearer, syntactically" property.

Vs.

proc GenericRecord(?).method() { }

is very clear that it's making a generic method.

But if we require proc GenericRecord(?).method() { }, would we also need to adjust primary methods?

Traditionally, these two declarations have been the same (which is hinted by the syntax):

record GenericRecord {
  type t;
  proc method() { }
}
proc GenericRecord.method() { }

However, if you have to write proc GenericRecord(?).method() maybe this is not so obvious anymore? Would it be more obvious if the situation were:

record GenericRecord(?) {
  type t;
  proc method() { }
}
proc GenericRecord(?).method() { }

?

I don't really like the record GenericRecord(?) proposal, because I think it will lead people to believe that they can indicate the type constructor signature there, which AFAIK isn't something we are proposing. But it would make this specific case arguably more consistent.

Summing up a few thoughts, I could be OK with this proposal, but it would bug me if it were the only thing we did, since as it stands now it does not actually solve #19120 / #19121. I absolutely think it could be part of a solution to those, though.

bradcray commented 1 year ago

Considering Paul's two cases in reverse order (and editing slightly):

class C(?) { ... }

I'm open to this but not particularly excited about it. I didn't have a good reason other than "I've never felt I wanted/needed to distinguish between generic and concrete types at this point in the code", but I think Michael's point about type constructors/initializers is a good one. I'd much rather make it clear that a declared type is generic by fleshing out our type initializer story (which has felt lacking to me) than taking this approach.

A class specified in a class-inherit clause, e.g. class D : C(?) { ...}.

I could imagine this context being less like a variable declaration context than a special-case. Specifically, the only thing you can put after that colon currently (I believe) is the name of a class, not a general expression that might happen to evaluate as the name of a class. So I think of class D: C { as being "Create a class D that is a subclass of C" rather than "create a class C whose type is constrained by C, say.

That makes me think maybe we want it to apply to secondary methods?

I don't feel like I'd want to here either, though again, maybe without much of a good reason other than that I think of the class name as specifying the scoping of the method / what type it's attached to, and not an argument type. But then I think of cases like proc (<more complex type expression>).foo() and wonder if it should be permitted there, or potentially even required.

is very clear that it's making a generic method.

Should this read "a method on a generic type"?

I generally continue to feel positively about requiring : C(?) in type specification contexts for generic types that are not fully-defaulted (though without much of a sense of what the impact would be on real code). I also continue to worry that the lack of a good type initializer story is the more major problem w.r.t. clarity for generic types.

cassella commented 1 year ago

A class specified in a class-inherit clause, e.g. class D : C(?) { ...}.

I could imagine this context being less like a variable declaration context than a special-case.

I can't tell from your response whether you think this case should require the(?)?

My thought was just at the basic level of since a declared field type is one way that a generic C would make D perhaps unexpectedly generic,

class D { var c: C; }

That so is

class D : class C { ... }

and was particularly aimed at the "When would the proposal apply" section of @mppf's comment (which I see has now been updated).

cassella commented 1 year ago

class C(?) { ... }

I'm open to this but not particularly excited about it. I didn't have a good reason other than "I've never felt I wanted/needed to distinguish between generic and concrete types at this point in the code", but I think Michael's point about type constructors/initializers is a good one. I'd much rather make it clear that a declared type is generic by fleshing out our type initializer story (which has felt lacking to me) than taking this approach.

I would find the (?) right after the name to be a lot more obvious reading someone else's code. Though I presume chpldoc could detect either and write this class is generic in letters of fire 100 points tall across the top of the page. (Though that helps only when reading the documentation.)

mppf commented 1 year ago

I feel I keep losing track of the rationales for this proposal, so I'm noting some here:

bradcray commented 1 year ago

Catching up with a question Paul asked quite awhile ago:

I can't tell from your response whether you think this case should require the(?)?

I was arguing that we should neither require nor permit the (?), but preserve the current subclass declaration syntax in which the parent class is simply named. Specifically, I liked this because (a) it's simpler, (b) it doesn't change something in the language today that nobody I'm aware of has been bitten by, and (c) it doesn't result in the potential for believing you could write more general expressions in that position like:

class D: C(int, ?) { ... }
bradcray commented 1 year ago

@mppf wrote:

I feel I keep losing track of the rationales for this proposal

Here's an attempt to capture cases on main today that seem inconsistent to me, where my sense is that this proposal would improve consistency:

case 1: inconsistency with requiring ? for partial instantiations

Given:

record R4 {
  param p: int = 33;
  type t;
  var x = p: t;
}

today on main, we require a ? to be added to this declaration:

  var r4a: R4(101) = new R4(101, real); 

in order to make it warning-free / match the preferred style:

  var r4a: R4(101, ?) = new R4(101, real); 

Yet we don't for this one:

  var r4c: R4 = new R4(303, real);

Q1: What makes the partial instantiations in r4a vs. r4c worthy of differing amounts of syntactic indicators that they are incomplete types?

case 2: Inconsistency with interpretation of defaulted generics

It also seems like it could help with inconsistency with cases like:

record R3 {
  param p: int = 45;
  var x = p;
}

where

var r3: R3 = new R3(55);

is an error on main today because R3 has a fully-defaulted generic that doesn't match the RHS expression. Yet,

  var r4: R4 = new R4(22, real);

is legal even though R4's param p value doesn't match the RHS expression's.

Q2: Why should the degree to which the default values of R3.p and R4.p are binding differ in these declarations?

case 3: Syntactic equivalence of concrete / fully-defaulted / fully-instantiated types

I think the final benefit is that when we see a declaration like:

var r: R;

we know that r has a fully-defined type, either because R is concrete, because it is fully defaulted, or because it is fully instantiated.

My answers to the questions posed above

Probably obvious from context, but in case now, I'll give my answers here:

A1: I don't see a reason why these cases should be treated differently, so would either suggest requiring

  var r4c: R4(?) = new R4(303, real);

or removing the warning for:

  var r4a: R4(101) = new R4(101, real); 

(where my preference is the first because it helps identify incomplete types)

A2: I don't think they should differ, and if we required:

  var r4: R4(?) = new R4(22, real);

I think the difference would be reduced.

bradcray commented 1 year ago

Talking to @mppf today, he verbally gave an answer to Q1, which I'd paraphrase as: "Given a type specifier like R(a, b, c), it looks a lot like a procedure call, and so one might expect it to be calling the type constructor and to be fully specifying the type. Yet if the type constructor had additional arguments, the compiler traditionally did not complain about this and simply considered it a partial instantiation. By warning about the lack of a ? / requiring a ? for partial instantiations, we better clarify the distinction between an intentional and mistaken partially instantiated type.

[edit: @mppf also points out that "in the process of programming / code evolving, it is common to forget an argument (and this has happened)".]

By contrast, for cases like generic record R whose type constructor takes one argument, saying R currently is a reference to the type symbol R, not a procedure call, so it's reasonable to expect that it is not saying everything about R, merely constraining the thing to be of type R. Meanwhile, the user could also write R(arg) in which case it's fully-specified or R(?) in which case it's clearly unspecified. If R's type constructor took two arguments, then saying R(arg) would put us back into the partially specified case above.

bradcray commented 1 year ago

Though we did not discuss Q2, I could imagine one answer to it being similar: That the type declarations R3 and R4 are merely designed to constrain the variable to be of type R3 or R4 and not to be interpreted as type constructor calls. If so, that suggests to me that var r3: R3 = new R3(55); should not be an error today, though var r3: R3() = new R3(55); should be. I've opened https://github.com/chapel-lang/chapel/issues/21988 to capture this notion.

bradcray commented 1 year ago

I'll mention that, after preparing for today's meeting about generics, I've cooled a bit on the proposal in the OP. Specifically, what I liked about it in proposing it was that it seemed like a natural follow-on to warning about the lack of ? in partial instantiations that we've implemented ("Help the user determine when more things are generic!"). But I worry that the logical extension of it is unwieldy. Specifically:

We could obviously invent syntax to attach ? or some other marker to all of these contexts, but this would be a very different language than today's, and I don't think I'd find it more attractive or easier to understand.

I also cooled on it because I buy Michael's rationale for why the partial instantiation case could justify using a ? from just above such that it doesn't feel as inconsistent if we were to stop here as it did to me before.

Meanwhile, for generic types—as the OP here focuses on—nothing prevents a user from writing their generic constraints as R(?) if they prefer that style, and we could add a compiler flag to enforce it if it became popular.

bradcray commented 1 year ago

I think I am feeling sufficiently not a champion of this issue, preferring others that are now open, that it could be closed now. But if others wanted me to keep it open on their behalf, I'd be happy to do so.