Should types be generic due to their initializers' argument lists rather than their fields?

chapel-lang / chapel

a Productive Parallel Programming Language

https://chapel-lang.org

Other

1.8k stars 421 forks source link

Should types be generic due to their initializers' argument lists rather than their fields? #21456

Closed bradcray closed 1 year ago

bradcray commented 1 year ago

Capturing some recent incomplete thoughts related to generic types here for posterity or to see what it shakes loose for others:

Traditionally, in Chapel, I believe we've considered a type to be generic based on its fields—that is, whether it has param fields, type fields, untyped fields, or fields whose declared types are generic. As we've been wrestling with generic types recently, I've been wondering whether this is completely wrong. For example, consider the case:

record R {
  param p: int;
  type t;
  var x;
}

Traditionally, we'd say this was generic because of t, p, and x. But now imagine its only initializer was:

  proc init() {
    this.p = 2;
    this.t = p*int;
    this.x = 3.1415;
}

Because of this initializer's definition, R only has one possible definition, so is therefore is arguably concrete, not generic. We could view p and t as ways of creating symbolic names for the class's usage (note the relation to the conversation on https://github.com/chapel-lang/chapel/issues/12613 which I'd summarize as "sometimes I want type t to give me a shorthand and not to make my type generic"), and x as being a case of laziness / leveraging Chapel's type inference.

For me, there's a strong analogy to split initialization where this code is similarly not generic, it just defers some bindings until after the declaration point:

param p: int;
type t;
var x;
p = 2;
t = p*int;
x = 3.1415;

This has led me to (re-?)wonder whether:

the generic-ness of a type should be based on how generic its initializers and initializer arguments are rather than what fields it contains
the phase 1 bodies of initializer routines should be unified by the compiler/required to be unifiable by the language similar to how the branches of a conditional statement are in the presence of split initialization, since they are reasonably analogous
when combined with changes like those proposed in #21410 and/or #21455, whether this would make whether or not a class/record was generic easier to determine than it is today (by looking at how generic its initializer argument lists are)

Beyond those musings, the main challenge question for me currently is: If the arguments of an initializer were generic, what would the implications on the type's type signature be? For cases like param and type arguments to the initializer, I think it's straightforward. For example, if I replaced the 0-argument initializer above with:

proc init(param p: int, type t, type xtype) {
  this.p = p;
  this.t = t;
  this.x = new xtype();
}

It seems logical that R's type signature would be something like R(param p, type t, type xtype) so R(2, int, C) might be a concrete instantiation of the type. But if the initializer were:

proc init(var x: int(?w)) {
  this.p = w;
  this.t = uint(w);
  this.x = x;
}

then it's less clear what it should be. E.g., maybe R(param w: int)? Or, what if it were:

proc init(var c: C(?)) { ... }

where C was a generic class (by this same definition)—what would it be then? And how much work would it be for a user or the compiler to determine this? Or how would they get the documentation for it?

Or, should it be the case that when an initializer does rely on generic arguments, the type author should have to write an explicit type initializer as well, for example, perhaps:

proc init(param w: int) type {
  this.w = w;
}

and the compiler will complain at them if they do not?

Anyway, despite this big lingering question, what I've liked about this thought process is that it seems to make field initialization and split variable initialization more similar to one another rather than less; and it seems to make types only as generic as they need to be (potentially not at all) rather than as generic as inspection of their fields might suggest—i.e., a naive reading of R's fields suggests it's generic in three ways while the initializers above show that it might be generic in no ways or just 1 way.

bradcray commented 1 year ago

@mppf: I think you mentioned this morning that there'd been a previous thread about potentially taking an approach like this (using initializer signatures to imply generic-ness rather than field types). If that is archived somewhere and you remember where, would you point me to it? If not, no problem.

lydia-duncan commented 1 year ago

My initial impression of this idea is that it will make reasoning about the type more difficult for someone reading the code, rather than less. Now instead of only looking at the fields of the type you have to look at all the initializer bodies and compare them to each other to calculate the union of what is supported. And since initializers allow the elision of initialization expressions when you want it to take the value provided by the field's default, you'll still have to look at the field declaration to understand what every field would be set to in each case.

It'll also make it harder to implement general support in a type. If you've written several initializers and the compiler decides that they were incompatible for a rule it's defined, how are you supposed to resolve that in a way that gives you the control you want? What sort of error messages would you provide to help a user figure that out?

This will also basically require the documentation of the type to be more extensive, since we can't rely on the field declarations to tell the whole story of what can and can't be supported (and we don't include the bodies of functions in documentation, nor should we).

I'm worried that by following this path, we'll make a less powerful and useful language.

mppf commented 1 year ago

It would be ideal if somebody reading code doesn't have to look at initializer signatures to determine if a field (or a type) is generic. I think combining #21455 and #21410 gets us most of the way towards having it syntactically obvious when a field makes a type generic (with generic-management classes and things like integral being the cases that would need a solution still (Edit: and things like var x: fnReturningGenericType())). Supposing we solve those parts that need a solution, I think the benefit of this idea is mainly that it might allow a solution to #12613 without writing something like type proc t type return uint; which is just weird looking code. But, for that issue specifically, I am comfortable with the direction of offering some new syntax, rather than reworking generics more broadly. Perhaps such syntax could apply beyond just the type t case and, like in this issue, allow things like var x: SomeGenericType to be inferred from initializers. But, I'm not sure that's something that Chapel programmers are clamoring for.

I think it's interesting to consider making fields like var x: SomeGenericType not be generic at all, but rather an inferred type from the initializers. But, that is arguably worse than the direction here (since one might need to look at initializer bodies to see what is going on) -- although if such fields are always concrete, you wouldn't need to look at the initializer body for the task of understanding if the type is generic and if so what the type constructor signature is.

vasslitvinov commented 1 year ago

My concern is that analyzing the initializers in this regard is complicated and probably Turing-complete. At the same time the benefit of determining that the type is not generic is small -- how often do we expect this to happen in user codes?

The situation where we treat a class/record as generic whereas all instances of it are of the same instantiated type is preferred for me over the complexity of doing otherwise.

BTW here is another situation where analyzing the initializers is challenging:

proc R.init(type tArg) { this.p=1; this.t = convertType(tArg); this.x = 2; }
proc convertType(type tArg) type where tArg == int return string;
proc convertType(type tArg) type where tArg == bool return string;

The compiler will need to detect or verify that all overloads of convertType return the same type.

vasslitvinov commented 1 year ago

it's interesting to consider making fields like var x: SomeGenericType not be generic at all, but rather an inferred type from the initializers

I agree it is interesting and I'd prefer this over the direction in the OP. I have considerations here:

then, var x; should probably be not generic either
if there is only the compiler-generated initializer, then it should be an error "need to give this a concrete type"

OTOH --- I suspect we all have written the code with a field like var x; or var d: domain; where the field's type comes from an initializer formal. Not only written, but also found it convenient. Furthermore, writing a concrete type for a field may be impractical, or "why do I need to write it out when the compiler can infer it?". In the var d: domain; case, it may even be impossible because of the runtime component that needs to come from an initializer argument.

Given these "otoh" considerations, I propose to stick with the status quo in this regard.

bradcray commented 1 year ago

Lydia wrote:

My initial impression of this idea is that it will make reasoning about the type more difficult for someone reading the code, rather than less. Now instead of only looking at the fields of the type you have to look at all the initializer bodies and compare them to each other to calculate the union of what is supported.

I'm not convinced that this will be true in common/typical cases, but I'm also not sure what you're anticipating being difficult. Can you say more about what you're trying to reason about / calculate when reading the code? And are you speaking from the perspective of an end-user of the type or another persona? Potentially related to your reaction is the following...

Michael wrote:

It would be ideal if somebody reading code doesn't have to look at initializer signatures to determine if... a type ... is generic.

This is where I was going with the thought about "should it be the case that when an initializer's argument list does include generic arguments (as determined by the compiler), the type author should be forced (by the compiler) to write an explicit type initializer as well". Specifically, if we were to do so, the only thing a user of the type would need to check in order to know whether a type was generic or not would be the presence or absence of a type initializer. If there is one, the type is generic, if there isn't, it's concrete.

You may also be saying "the user shouldn't even have to look for a type initializer to determine if the type is generic". But it seems to me that looking for the presence/absence of a type initializer is a much faster way to determine whether a type is generic than by looking through all its fields—particularly if we encourage or require the type initializer to be the first thing declared in the type. Moreover, the type initializer makes good sense as part of a type's public, documented interface, whereas the names and types of the fields should arguably ideally be hidden in a well-written class (in which case an end-user would not even have the opportunity to browse the fields and their types unless they were to go beyond the docs and into the source code). By having user initializers with generic argument lists require a user type initializer to be supplied, we move the information about whether the type is generic out of the implementation and into the public interface.

As a specific example of how looking for a type initializer would be an improvement over browsing the fields, I'm thinking of my comment on test/library/draft/DistributedMap/v2/DistributedMap.chpl in https://github.com/chapel-lang/chapel/pull/21410/files where, once I hit variable fields like targetLocales, locDom, tables, and locks, I stopped looking for fields that might make the type more generic, so missed some. Arguably this was just my fault for being a lazy reader, but if the type's first declaration (in the docs, if not the code) had been:

proc init(type keyType, type valType, type funcType=nothing) type {
  this.keyType = keyType;
  this.valType = valType;
  this.funcType = funcType;
}

then I'd immediately have known that it was generic and precisely what it was generic over. Under this part of the proposal, the type would have been required to have a type initializer like this since the user-defined initializers were generic (since they took type arguments).

[In quoting you, I elided the part where you said "a field", in part because we're already discussing that on other issues... But also because we have the same problem in the split initialization case (i.e., not knowing whether a given variable's type is generic or not). The traditional argument for why it's worse in the case of fields (as I understand it) is that it infects the type itself. But if the fields' types don't automatically infect the enclosing type's genericity, as I'm proposing here, then whether or not the fields are generic seems much less important for a reader of the code to know—similar to variables (where I think requiring ? on partially instantiated types will also go a long way toward improving the situation in both cases). As in the variable declaration/split initialization case, the important thing is that they are concrete by the time they are initialized. They could even all be private and hidden from the user in the docs, which would that much better].

Vass said:

The compiler will need to detect or verify that all overloads of convertType return the same type.

Your example makes me think that I don't have something quite right in my explanation of my current thought process. Which would not be surprising, as I'm just barely pulling the thoughts together myself.

Specifically, the reason I don't think the compiler would need to do that is as follows: If, instead of saying this.t = convertType(tArg);, your initializer had simply said this.t = tArg; my intention wasn't that tArg would always need to be the same across all initializers or initializer calls, since that would mean there'd be no way to write a generic type. Rather, it was that the class would be generic with respect to tArg. Same thing with convertType(tArg)—the type would still be generic w.r.t. tArg; and convertType() could return different types for different argument types without any problem. The key is that the type would be generic w.r.t. tArg, not w.r.t. the field t that was used to store the result of tArg. To me, this seems similar to a split initialization case like the following (which works today:

type t;
config type tArg;

if tArg == int then
  t = real
else
  t = complex;

So I may have drawn a tighter correlation to conditionals in split initialization than I should've... Or maybe there's an exception for code that relies on type/param properties since they're similar to the folded conditional above? Or it could be that once I got to the "maybe the user would have to write a type initializer" part of the proposal, the need to have the compiler unify initializers was lessened somehow?

I'm honestly not sure which it is, and am mostly hoping that the analogy between "fields::bodies of initializers" vs. "variables::initializers (whether split or in-line)" holds some water for others. To me, they seem very analogous, which is why I'm very reluctant to establish different rules for one vs. the other. (And meanwhile, the lack of user-defined type initializers in the language has been a continual concern for me since we moved from constructors to initializers, so if they're part of what we're missing in creating types that are obviously generic—which seems like the case to me—then all the better).

Vass asked:

how often do we expect this to happen in user codes?

Most of the time that I've used generic class management with fields, I've been using it as a shorthand for whatever class management the initializer sets the field up with—typically a single, specific management kind embedded into the initializer body that I was too lazy to type out; not because I want the class to be generic across management styles or to have different instances of the class have different management styles. You also list a number of cases where I think we'd want it to happen, and have relied on it in the past, here:

OTOH --- I suspect we all have written the code with a field like var x; or var d: domain; where the field's type comes from an initializer formal. Not only written, but also found it convenient. Furthermore, writing a concrete type for a field may be impractical, or "why do I need to write it out when the compiler can infer it?". In the var d: domain; case, it may even be impossible because of the runtime component that needs to come from an initializer argument.

Given these "otoh" considerations, I propose to stick with the status quo in this regard.

So again, I probably have something wrong in my OP's explanation, because the way this proposal looks in my mind, it wouldn't prohibit these cases. So I probably just haven't got the explanation straight yet.

Maybe my next step should be to look at some motivating challenges and see how this proposal would play out in those cases.

vasslitvinov commented 1 year ago

Maybe my next step should be to look at some motivating challenges and see how this proposal would play out in those cases.

This sounds great.

It would be ideal if somebody reading code doesn't have to look at initializer signatures to determine if... a type ... is generic.

I propose to annotate generic records/classes with (?) or another syntax AT DEFINITION, ex. record R(?) {.....} This is Approach 4 in #19120. Rationale: if we require users to write (?) whenever USING a generic type, as in #21455, annotation at the def is the least we can do to make it easy to use the type.

The key is that the type would be generic w.r.t. tArg, not w.r.t. the field t that was used to store the result of tArg. To me, this seems similar to a split initialization case

I get the analogy between field decls/initializers and split initialization. What I do not see is a good rule to infer genericity from initializers. For example, if my initializer is proc init(type tArg), the underlying type still may or may not be generic / be instantiated in multiple ways or always in the same way.

Perhaps if the user starts out by declaring that the type generic (or not), the compiler can then go in and verify that the initializers conform to that. The latter sounds impossible in the general case. Perhaps there are some restrictions we can throw at it? That is, IF the record class does not have the "I am generic" annotation like (?) AND it has type/param/generic-var fields, THEN all its initializers need to satisfy certain conditions, otherwise it is an error.

bradcray commented 1 year ago

For example, if my initializer is proc init(type tArg), the underlying type still may or may not be generic / be instantiated in multiple ways or always in the same way.

That's a good point. As this proposal stands so far, the compiler would require you to create a type initializer (since the arguments to your initializer are generic, suggesting that the type now may be generic). But perhaps you could still write a type initializer that would assert that the type was concrete, such as:

record R {
  proc init type { }   // type initializer takes no arguments, so `R(...)` is not a legal type, so `R` is the only way to refer to this.

A productive variation on the proposal might be that the compiler would only require you to create this type initializer if both (a) the initializers took generic arguments and (b) any of the fields had generic types.

the compiler can then go in and verify that the initializers conform to that.

What does this mean? What is the compiler checking conformance on?

I propose to annotate generic records/classes with (?) or another syntax AT DEFINITION, ex. record R(?) {.....} If we require users to write (?) whenever USING a generic type, as in https://github.com/chapel-lang/chapel/issues/21455, annotation at the def is the least we can do to make it easy to use the type.

That's an intriguing suggestion. I'd suggest spawning it off into a new issue since this one has already bogged down fairly quickly and it feels independent (if compatible).

vasslitvinov commented 1 year ago

As this proposal stands so far, the compiler would require you to create a type initializer (since the arguments to your initializer are generic, suggesting that the type now may be generic). But perhaps you could still write a type initializer that would assert that the type was concrete

I missed that. Then, I question the benefit of this and consider productivity.

If I want my var x; field not to introduce genericity of the enclosing class/record, I need to write in the type initializer what I want its type to be. Is this the proposal? Then, why not just write that type directly in the field decl?

I propose to annotate generic records/classes with (?) or another syntax AT DEFINITION, ex. record R(?) {.....}

This is Approach 4 in #19120. I updated that comment with this reference.

vasslitvinov commented 1 year ago

If I want my var x; field not to introduce genericity of the enclosing class/record, I need to write in the type initializer what I want its type to be.

If this is not the proposal, then how does the compiler know what x's field should be? If the compiler infers it from the initializer(s), how does it do it in the general case?

mppf commented 1 year ago

Michael wrote:

It would be ideal if somebody reading code doesn't have to look at initializer signatures to determine if... a type ... is generic.

This is where I was going with the thought about "should it be the case that when an initializer's argument list does include generic arguments (as determined by the compiler), the type author should be forced (by the compiler) to write an explicit type initializer as well". Specifically, if we were to do so, the only thing a user of the type would need to check in order to know whether a type was generic or not would be the presence or absence of a type initializer. If there is one, the type is generic, if there isn't, it's concrete.

I did not interpret the original proposal as suggesting that people should be manually writing type constructors. (Is that what you mean by a type initializer?). I think it would be workable for generic-ness of a type to be determined by the presence of a custom type constructor. And, this approach could completely address my concerns around it being too hard to know if a type is generic (#19120).

I am really not following the discussion of initializers that aren't type initializers / type constructors. I'd like to suggest that we adjust the proposal write-up (at the top) to start with the "authors of generic types have to write custom type constructors / type initializers" idea, which I see as the main part here that could work. Then, the proposal can talk about inference/compiler-generated stuff & heuristics that indicate "maybe you meant to write a type initializer". (and, of course, that could happen in a different issue if it is diverging too much from what you were originally thinking).

mppf commented 1 year ago

BTW strictly from a terminology point of view, I think the term "type constructor" is better than "type initializer" because "type initializer" can be interpreted as "the initializer for the type" (i.e., a regular initializer). We don't have to converge on that to discuss your proposal but if you at least put both terms (supposing they mean the same to you) then I think that would help its readability.

lydia-duncan commented 1 year ago

We discussed and discarded the idea of adding support for type initializers after initially being intrigued by them. We discard the concept due to their potential to add confusion about what combinations are actually supported and how easy it would be to write one that was incompatible with the initializers that were defined. The approach we have today is simpler, more understandable, and has less potential pitfalls.

I'm not convinced that this will be true in common/typical cases, but I'm also not sure what you're anticipating being difficult. Can you say more about what you're trying to reason about / calculate when reading the code? And are you speaking from the perspective of an end-user of the type or another persona?

I think it gets more difficult the more initializers are defined on a type. Simple types may have one or two, but we have several types that have 4 or 5 initializers defined on them. BlockDim has 5 initializers, bigint has 6, datetime has 4, range has 6, to name a few that were obvious from doing a grep. With that many initializers, how would you personally keep track of the type combinations enabled by them? As the person writing the type to ensure it is useful more broadly? As someone trying to use the type? As someone trying to modify a type someone else has already written? The work to do so is the number of fields * the number of initializers, right? If the compiler is going to make assertions about what is supported for that type based on those initializers and tell you something is wrong about a combination, you'll need to figure out what combinations it thinks are available and which ones it thinks aren't, so you'll need do that analysis yourself.

mstrout commented 1 year ago

from the OP: whether the generic-ness of a type should be based on how generic its initializers and initializer arguments are rather than what fields it contains

I would recommend that the generic-ness be clear from the user needing to define a type constructor (this is based on a suggestion @bradcray is making above). If the user creates a type constructor for the record, then it is clear the user plans on it being generic, and the compiler doesn't need to warn that the record is generic.

from @lydia-duncan, We discussed and discarded the idea of adding support for type initializers after initially being intrigued by them. We discard the concept due to their potential to add confusion about what combinations are actually supported and how easy it would be to write one that was incompatible with the initializers that were defined. The approach we have today is simpler, more understandable, and has less potential pitfalls.

I would like to hear more about this, because it seems that requiring a type constructor would help make the genericness of records much clearer and potentially have people be more comfortable with compiler warnings when a record is generic and doesn't have a type constructor. Could the compiler just throw errors when the user-defined initializers are incompatible with the type initializer? Can you show some examples where incompatability happens?

From the OP, I do not think examples like the below should be allowed. This example has the field declarations indicating the record is generic, but then the initialization definition making the type concrete. I do see the analogy with split initializations, however user-defined initialization functions can be much more numerous and thus making analyzing what is going on difficult.

record R {
  param p: int;
  type t;
  var x;
}
  proc init() {
    this.p = 2;
    this.t = p*int;
    this.x = 3.1415;
}

The compiler could give an error for the above, and the programmer could instead write

record R {
  param p: int = 2;
  type t = p*int;
  var x = 3.1415;
}

lydia-duncan commented 1 year ago

from @lydia-duncan, We discussed and discarded the idea of adding support for type initializers after initially being intrigued by them. We discard the concept due to their potential to add confusion about what combinations are actually supported and how easy it would be to write one that was incompatible with the initializers that were defined. The approach we have today is simpler, more understandable, and has less potential pitfalls.

I would like to hear more about this, because it seems that requiring a type constructor would help make the genericness of records much clearer and potentially have people be more comfortable with compiler warnings when a record is generic and doesn't have a type constructor. Could the compiler just throw errors when the user-defined initializers are incompatible with the type initializer? Can you show some examples where incompatability happens?

Sure! I believe this conversation happened in person with a whiteboard, but I'll try to recreate it as best I can. Basically, by making the type constructor user-definable, we open up a can of worms in terms of what they can write. We have to think about what a user could do with this capability rather than just proper uses of the feature. Here are some cases that worried me

Case 1. User argument list is in a different order than the fields

Developers aren't necessarily consistent. If we let them, they might put the generic arguments in a different order than the field order. Why is that a problem? Say you have a type defined like this:

record A {
  type first;
  type second;
}

If the type constructor reverses their order, that means that var myA: A(int, real); will actually be of type A(real, int). Printing the type of myA will be confusing - "why does it say the type is A(real, int), I sent in A(int, real)?" Maybe that's a source of momentary confusion if you have an instance and you're trying to write a function that will take that instance. In that case, you can write the same order and you'll accept compatible types. But what if you have a generic function and you're trying to figure out the type that got sent in?

proc genericFunc(x) {
  writeln(x.type: string);
}

You'd be pretty tempted to use the type that got printed as is, but using it in your code wouldn't allow you to match against the argument:

proc genericFunc(x)
  where x.type == A(real, int) { ... } // `A(real, int)` calls the type constructor, so results in `A(int, real)`!

How would you know what to write to get the type you printed?

"It's on the person writing the type to not write something that confusing" - maybe, but this is an example where giving the user control is strictly worse than not allowing them to define type constructors. Someone could do it intentionally, or they could do it accidentally because they had a lot of fields or fields that were interspersed with other code and mixed up the order the fields were in when writing up the type constructor. If they did it accidentally, it could result in subtle bugs for the functions they write that use the type. We would be allowing this by supporting type constructors. We’re trying to reduce confusion with generics but by doing this we’re adding a new way to get confused.

Case 2. Type constructor takes less arguments than there are generic fields in their type

At first, this seems like something we want to support. After all, the person writing the type could have fields that are generic but implementation details and they want to have explicitly control over them. The trouble is, that opens the question of what the type should display as.

record A {
  type first;
  type second;
  type third; // Say this one is an implementation detail

  // define a type constructor that only accepts arguments to set the first and second.  Maybe the third depends on their combination or something
}

That's all well and good. You define an instance using the type constructor call:

var myA: A(int, real);

When you print the type of myA, what should it say? Should it say what the user has written? Or should it say what the type is under the covers? It seems like there's value to both. If the user wants to write a function that they can send myA, they will want that output to be exactly what they can write to get it. If the person writing the type wants to check things have been set correctly for their implementation details, they will want the output to tell them everything. Printing the full type is what we do today, printing what the user wrote so that they can use it to create other type declarations means maintaining information about the original call used, which is not something we do today in other circumstances.

Case 3: What if the user writes a type constructor that computes the values for generic fields based on the arguments provide?

This case is very similar to Case 2. There will be situations where you want to know what the type actually is and what to write to get the same type as another instance or to limit instantiations that are available.

lydia-duncan commented 1 year ago

When should R(a, b) be interpreted as the type and when should it be interpreted as a type constructor call? We don't have to make that distinction if we don't support type constructors.

mppf commented 1 year ago

In this comment, I'd like to sketch some details for how custom type constructors could work with some of the tricky cases (things brought up by @lydia-duncan above and also partial instantiation). I know that @bradcray is proposing that regular initializers might impact the genericity of a type / what the type constructor is; but, so far that doesn't make sense to me, so in this comment I will focus only on enabling users to write a custom type constructor and discussing how things will behave when there is a custom type constructor.

How to actually write the type constructor?

Here I'm supposing that we write it like this:

record R {
  type intType;
  proc type init(param w: int) {
    this.intType = int(w);
  }
}
var myR: R(32); // generates an R with intType=int(32)

(I think it is debateable whether the type here goes in the this intent spot or the return intent spot or both. I think the this intent spot makes more sense because initializers don't really "return" in the normal sense.).

(Earlier, I proposed that this could be written with proc type this(param w: int) type. What I like about Brad's direction here is that it avoids the need to have different meaning for e.g. R(int(32)) within the custom type constructor vs outside of it. Brad's direction avoids that problem by expressing the type construction process as setting individual fields. The main drawback I see is that I think the term "type constructor" is clearer than "type initializer"; but the syntax clearly conveys that it is some sort of "type initializer". However I think that is a relatively minor issue, in the scheme of things.)

Why do we need custom type constructors?

This gets a bit to responding to @lydia-duncan's comment. Certainly, the language design is simpler if we do not add support for custom type constructors. But, the same can be said for most features.

To me, the most compelling argument for why we need custom type constructors is that they allow hiding implementation details. Relatedly, they allow a type to change its implementation over time, without changing its API.

For a concrete example, we might imagine that we started with record R { param intWidth; } but then, for one reason or other, wanted to change the implementation to record R { type intType; }. Today, we don't have a way to make that change while arranging for R(32) to continue to work.

I do think that, in many cases, the ways in which a type can be generic do form part of its API. But, I don't think that's necessarily the case. So, I think it should be possible to hide some of the details from the API.

Finally, it's possible to use a type constructor to simply constrain the instantiations to certain cases. Today, if you want to add compilerError calls if your type is instantiated with int(8) (as an arbitrary example), you'd have to do that somewhere connected to actually creating values (initializers, a function called from these, postinit, etc). But, that might lead to confusing errors. For example:

record R { param intWidth; }
type t = R(100); // 100 is not a valid integer width, but it's not an error yet
... // imagine lots of code in between here
var x: t; // most likely, we will get the error at this point

A secondary reason for needing custom type constructors is that it can clarify the language design around when a type is generic & what the type constructor is (in other words, it can help with #19120). That is the reason that @bradcray originally created this issue. I think there are other ways to address that problem, but if we come to a solution based on custom type constructors, it absolutely could solve the problem. Speaking for myself about issue #19120, I care that we improve the situation somehow, but I am open to many solutions.

Partial Instantiations

Let's consider a record with two type fields & a custom type constructor. How can that work with partial instantiations?

record GR {
  type indexType;
  type elementType;
  proc type init(param indexWidth, param elementWidth) {
    this.indexType = int(indexWidth);
    this.elementType = real(elementWidth);
  }
}

The above type constructor is similar to the 1st example I showed but it uses two arguments to set two type fields to a numeric type with that width. Of course, this is not necessarily a compelling use case. The point here is that it is a non-trivial type constructor that we can use to demonstrate partial instantiation with custom type constructors.

I am supposing that partial instantiation will not work with this type as written. But, I think that it's possible for the author of the type to modify it so as to support partial instantiation.

Here is an example showing a partial instantiation use-case for this type that we might like to work:

type Idx32 = GR(indexWidth=32, ?); // partial instantiation
type Idx32Elt64 = Idx32(64); // fills in the remaining details; a full instantiation

Here I am supposing that the call GR(indexWidth=32, ?) will still invoke the custom type constructor. It will just pass ? for elementWidth. To support that, the author of GR could write this:

record GR {
  type indexType;
  type elementType;
  proc type init(param indexWidth, param elementWidth) {
    this.indexType = if indexWidth == ? then ? else int(indexWidth);
    this.elementType = if elementWidth == ? then ? else real(elementWidth);
  }
}

This initializer sets the indexType and elementType fields to ? if the type constructor is called in a way that leaves them generic. But, if the type constructor provides a value for them, it will use it.

Partial Instantiations: Impact on Regular Initializers

In the context of this custom type constructors example, the type fields indexType and elementType are implementation details and not meant to form part of the API for GR. As such, the current approach of using named-argument passing to provide details of the instantiation to a default initializer will not do.

Let's look at an example with a default initializer. I can see two options for how to get it to work. The code sample below shows both next to each other for brevity, but I would not expect both to ever be provided or for that to work.

record GR {
  type indexType;
  type elementType;
  var idx: indexType;
  var elt: elementType;
  proc type init(param indexWidth, param elementWidth) {
    // as above (left out here to focus on the value initializer)
  }
  // Option 1: Named arguments, but using the type constructor's names
  proc init(param indexWidth, param elementWidth) {
    if indexWidth == ? || elementWidth == ? {
      compilerError("Can't default initialize without setting indexWidth and elementWidth");
      // or, instead of error, could compute a default, if such behavior is desired
    }
    var myIndex : int(indexWidth);
    var myElement : real(elementWidth);
    this.idx = myIndex;
    this.elt = myElement;
  }
  // -- or --
  // Option 2: Using this.type working with the generic field's names
  proc init() {
    if this.type.indexType == ? || this.type.elementType == ? {
      compilerError("Can't default initialize without setting indexWidth and elementWidth");
      // Or, instead of error, could compute a default, if such behavior is desired.
      // To use a default, this function would set fields, e.g. 'this.indexType'
    }
    // this uses indexType and elementType, but that's OK, because
    // this function forms part of the implementation so can see such implementation details
    var myIndex : indexType;
    var myElement : elementType;
    this.idx = myIndex;
    this.elt = myElement;
  }
}

Printing Types

The tricky cases in @lydia-duncan's comment above are largely about challenges when printing types. But, I think that part has a pretty clear answer. Since, if a custom type constructor is used, the type/param fields become implementation details, they should not be printed by the compiler. Instead, the compiler should print out the type by repeating whatever invocation created the type (or some normalized version of it). (I think this has an implication that, if there are multiple type constructors that generate the same type, the compiler will choose one to print. I think that is acceptable and unlikely to come up in practice, but it could admittedly be strange to encounter).

Here is an example (based on the type discussed above):

type Idx32Elt64 = GR(32, 64);
writeln(Idx32Elt64:string); // outputs 'GR(indexWidth=32, elementWidth=64)'

Of course, the implementer of GR might want to inspect the type fields. That is fine and they can do that by writing e.g.

type Idx32Elt64 = GR(32, 64);
writeln(Idx32Elt64.indexType:string); // outputs 'int(32)'
writeln(Idx32Elt64.elementType:string); // outputs 'real(64)'

I think it is reasonable to require that such an implementer take this kind of action. It puts the burden on them if there is a really bizarre relationship between the custom type constructor and the type fields. It allows the type fields to be private because they would be able to write debugging code like this within their own module.

Other Generic Fields

Chapel classes/records also can have generic fields declared like var x; or var y: SomeGenericType;. I think that these can be handled by a custom type constructor as well. In fact, requiring a custom type constructor for such cases would sufficiently address the problem described in #19120 (in my opinion). If we were to require a type constructor for such cases, can we also solve the default-initialization problem described in #16508 ?

Here is an example that I think demonstrates that it can solve both of those problems.

record XR {
  var x;  // note: this example applies equally well if this were 'var x: integral;'

  // custom type constructor
  // since it takes a generic type as an argument, it's easy to
  // see that this type is generic
  proc type init(type xType) {
    this.x.type = xType; // sets the type of 'x'
  }

  // Default initializer using Option 1 from above
  // (Named Arguments w/ Type Constructor Names)
  proc init(type xType) {
    this.x = 1: xType; // default initialize 'x' to '1' with the appropriate type
  }
  // -- or --
  // Default initializer using Option 2 from above
  // (Using this.type working with the generic field's names)
  proc init() {
    this.x = 1: this.type.x; // default initialize 'x' to '1' with the appropriate type
  }
}

vasslitvinov commented 1 year ago

I like this approach! I love especially its API benefits. Thanks Michael for working out the details.

Here are my concerns.

Judging the genericity of a record/class by the presence of a type constructor is not, fundamentally, much different than judging it by the presence of a generic field. Granted, it is easy to grep for "proc type init" and not so easy to grep for a generic field. Still, if we are concerned about users' ability to tell if a type is generic, we should go for a "right in your face" annotation as in Approach 4, ex. record R(?) {....}, regardless of this proposal.
In simple cases, I would not want to go through the extra hassle of type constructors instead of relying on compiler-generated initializers. For example, I would like to just type the following and move on to writing meaningful code:

record R(?) {
  param p;
  var x;
}

proc R.doImportantWork() { /*off we go*/ }

How to ensure that the value initializers are consistent with the type constructors? I did not catch that in Michael's comment. It may confuse the user if (new R(xyz)).type does not match R(xyz) .
If we conclude that the type is concrete, how to tell the types of its generic fields, values for param fields? My understanding is that this proposal allows such a scenario.
How much time and resources do we have before 2.0 to live this proposal and do it justice?

Here are some ideas.

If we use Approach 4, we can make type constructors optional even for generic records/classes. The rule is: if the record/class is annotated as generic and there are no type constructors, then rely on generic fields like we do today.
I can see arguments in both directions on whether we should ensure (new R(xyz)).type == R(xyz). One thing we can do is to prefix-match the value initializer's formals against the type constructor's. Another is to have the value initializer declare the type of the instance it produces using the return type syntax. I do not think we can auto-infer the type constructor given an instance in the general case.
For printing out the instantiated type, the compiler should remember the type constructor call that created this instantiation. For example, if R(5) and R("hi") are internally the same type, the compiler should still print them differently. OTOH this will add slight complexity because now distinct types can be "equal".
We can take the approach that a partial instantiation that uses the type constructor is merely a FCF, ex. GR(32,?) is \elementWidth. init(32, elementWidth). Although it becomes unclear how to write init= for such a partially-instantiated type.
A good case study is to apply this approach to _array and _domain.

mppf commented 1 year ago

An issue I thought of with the approach in my previous comment is that, today, a proc init that initializes a value can also do type construction. That supports inferring the resulting type in a case like var x = new GR(myInt32, myReal64). The question in that case is, how can the compiler print out x.type in a way that is consistent with how the type could have been declared?

I think it's worth evaluating a few ideas for that if we move forward with the custom type constructors direction. One possibility is to have authors of such initializers explicitly call the type constructor, e.g.:

record GR {
  type indexType;
  type elementType;
  var idx: indexType;
  var elt: elementType;
  proc type init(param indexWidth, param elementWidth) {
    // as above (left out here to focus on the value initializer)
  }
  proc init(idx: int(?w1), elt: real(?w2)) {
    this.type = GR(w1, w2);
    // Does this also need to set 'indexType' and 'elementType' fields?
    // I would expect that would be necessary were either to store a runtime type,
    // but otherwise, doing so would be redundant.
    this.idx = idx;
    this.elt = elt;
  }
}

Edit: Vass suggested some other ways to do this in his comment --

I can see arguments in both directions on whether we should ensure (new R(xyz)).type == R(xyz). One thing we can do is to prefix-match the value initializer's formals against the type constructor's. Another is to have the value initializer declare the type of the instance it produces using the return type syntax. I do not think we can auto-infer the type constructor given an instance in the general case.

Te return type syntax way would look like this in my example:

record GR {
  ... as above ...
  proc init(idx: int(?w1), elt: real(?w2)) : GR(w1, w2) {
  }
}

That seems an interesting approach but IMO the main downside is that initializers don't really return, so using the return type syntax is a bit odd.

mppf commented 1 year ago

Responding to one thing from @vasslitvinov --

How much time and resources do we have before 2.0 to live this proposal and do it justice?

I think that it's good for us to explore this direction a bit more. I am hoping that we can gain an understanding of which elements of the current language design we can think of as stable, supposing that we will eventually add custom type constructors as a feature. In particular, perhaps we will feel that requiring custom type constructors in certain situations will address problems we are facing in a way that keeps the language understandable and consistent.

Anyway, if we feel that we have a pretty good idea where we want to head in the solution space for #19120 in the long term (whether it involves custom type constructors or something else), in the near term we can probably make simple changes; for example, making fields with generic declared type simply be unstable or result in an error. The concern with doing a simple change like that without having an idea where we want to go in the long term is that we don't know how much else we would need to change to solve the problem (and so it is hard to argue that the other elements are stable).

lydia-duncan commented 1 year ago

Something that occurred to me this morning: If we rely on the presence of type constructors to indicate what the printed type should be, what impact should that have on the types that are allowed to be grouped together in an array?

Today, we only allow one particular generic instantiation in an array. If multiple type constructors could lead to the same internal type, what element type should be printed for the array? Should we limit it to only instantiations made with the same arguments to one type constructor, to avoid this? Or will that be too limiting for users?

bradcray commented 1 year ago

Something that occurred to me this morning: If we rely on the presence of type constructors to indicate what the printed type should be, what impact should that have on the types that are allowed to be grouped together in an array?

My initial thought on this is: We don't permit users to write any type constructors today, and for the compiler to only create one. So as a starting point, I'd suggest we only support a single type constructor per type and wait for someone to tell us that's insufficient.

If/when we supported multiple type constructors, then I almost think we could just pick one arbitrarily in this case. Either the user is going to specify a type constructor using an eltType=... expression (in which case we could pick that one), or the array module code is going to, in which case we can pick the one that it chooses.

A few other potential thoughts include:

maybe we always use a type's first type constructor for how it prints out its type even if there are multiple
maybe we give the type author the ability to say how their type is stringified, overriding the compiler's default

bradcray commented 1 year ago

I feel as though this issue may also have run its course and that https://github.com/chapel-lang/chapel/issues/21992 is the most natural successor to the aspect of it which caught the most attention, so would be inclined to close it unless others thought it should remain open for any reason.

mppf commented 1 year ago

It would be good to port over / link the proposal for how the user-defined type constructor can help with the current problems with fields like var x;. I don't think that aspect is yet covered in #21992 & it has some syntactical implications (this.x.type = xType;). There are also still challenges with "How does the compiler print out an instantiated type?" that don't go away when we only allow 1 type constructor (e.g. https://github.com/chapel-lang/chapel/issues/21456#issuecomment-1416742525). I think it would be good for #21992 to include an outline of sorts that refers to these issues / solutions.

bradcray commented 1 year ago

Closing this, as https://github.com/chapel-lang/chapel/issues/21992 now captures the most valuable aspect of this issue (though it also refers back to this issue a lot for key discussions. I just can't see continuing type constructor threads of conversation here at this point, and don't think this issue's title and OP are highly relevant anymore on their own.