chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.79k stars 420 forks source link

Better distinguish between type constraints (`R`) and type constructor calls (`R(...)`)? #21988

Closed bradcray closed 1 year ago

bradcray commented 1 year ago

Today, for a record R, the compiler seems to treat type constraints R and R() interchangeably. This issue proposes that we take more care to differentiate these notations, specifically to address a few cases that seem surprisingly asymmetrical.

Specifically, I'm proposing that any paren-ful reference to R be treated as an invocation to R's type constructor, while any paren-less reference be treated as an indication that the variable is constrained to be of type R without saying any more about it.

This issue is a logical follow-on to https://github.com/chapel-lang/chapel/issues/21455 and https://github.com/chapel-lang/chapel/issues/18214

case 1: Treatment of fully defaulted generics vs. partially defaulted generics

Given a generic record that is fully defaulted:

record RDG {  // a generic record that is fully defaulted
  param p = 1;
  type t = int;
  var x: p*t;
}

Today, we cannot declare new instances of this record using RDG or RDG() unless those new instances precisely match the defaults for RDG.p and RDG.t ATO.

That is, both:

var rdg1: RDG = new RDG(2, real);  // error since RDG's defaults for p and t are not 2 and real, respectively
var rdg2: RDG() = new RDG(3, string);  // ditto

generate errors like:

error: could not find a copy initializer ('init=') for type 'RDG(1,int(64))' from type 'RDG(3,string)'
note: this candidate did not match: RDG.init=(other: this.type)
note: because actual argument #1 with type 'RDG(3,string)'
note: is passed to formal 'other: RDG(1,int(64))'

Yet, if the generic record is only partially defaulted ATO, such calls are allowed:

record RPDG {  // a generic record with partial defaults
  param p = 1;
  type t;
  var x: p*t;
}

var rpdg1: RPDG = new RPDG(2, real);  // works even though RPDG.p's default is not 2
var rpdg2: RPDG() = new RPDG(3, string);  // ditto

This issue proposes interpreting R[P]DG as meaning "I am some instance of R[P]DG without saying any more about it such that the paren-less versions of both declarations above will work. They merely constrain the type of the declaration without supplying any more information about its generic aspects.

It also proposes treating the paren-ful interpretations of R[P]DG as meaning "I am a call to R[P]DG's type constructor and will create a type that will govern this declaration. In this case, the declaration of rdg2 would fail in the same way as it is today (because RDG() would create type RDG(p=1, t=int); while the declaration of rpdg2 would cause a warning today — because it's a partial instantiation of a type yet is not using a ? argument to indicate that — and then an error since, at best, RPDG() would create RPDG(p=1, ?) which is not compatible with RPDG(3, string);

case 2:

RC is a concrete record:

record RC {
  var x: int;
}

yet instances of it can currently be declared using either of the following forms:

var rc1: RC;
var rc2: RC();

This could seem a little odd ("Why support two ways of saying the identical thing?") and could potentially be resolved by saying that the compiler-generated type constructor for a concrete record is a paren-less routine, making the second form illegal.

Alternatively, we could say that the compiler-generated type constructor for such types is a 0-argument, paren-ful type constructor in which case we have a rationale for why both forms are supported.

lydia-duncan commented 1 year ago

I think I fully support this proposal. Obviously we'll want the implementation to distinguish these uses from ones that involve new but I'm hoping that won't be too tricky

mppf commented 1 year ago

Specifically, I'm proposing that any paren-ful reference to R be treated as an invocation to R's type constructor....

I am fully behind this part of the proposal

while any paren-less reference be treated as an indication that the variable is constrained to be of type R without saying any more about it.

I am not so sure about this part.

case 1: Treatment of fully defaulted generics vs. partially defaulted generics

If I am understanding this proposal correctly, if we continue to define range as it is today (where it is generic-with-defaults), then the following will change meaning:

var x: range;

since now range is generic in this context?

That seems both good and bad to me. The good thing is that we can easily created a strided range:

var y: range = 1..20 by 2;

The bad thing is that default-initializing a range will no longer work:

var x: range;
writeln(x); // now an error because `x` has generic type still?

case 2:

var rc1: RC;
var rc2: RC();

This could seem a little odd ("Why support two ways of saying the identical thing?") and could potentially be resolved by saying that the compiler-generated type constructor for a concrete record is a paren-less routine, making the second form illegal.

That would be my preference. I do not know of a motivating use case for supporting RC() for a concrete record RC. If we make it an error, and find a motivating use case in the future, we can add it back in as a non-breaking change.

One positive benefit of making it an error is that if you see code like SomeType() then you know that it is some kind of generic type & it is a type constructor call using the defaults.

bradcray commented 1 year ago

then the following will change meaning: var x: range; since now range is generic in this context?

I think that's correct, though if we make ranges not fully-defaulted, the ability to write declarations like this seems likely to go away anyway.

If your point is not intended to be specific to ranges, but expressing regret that this would be true for other fully-defaulted generic types, then one answer would be to switch to the form var x: R(); (e.g., var x: range(); if ranges remain fully-defaulted)

Arguing against myself, a downside to that suggestion is that it wouldn't permit the user to write int as a generic record with a default, and that definitely holds some weight for me. Maybe more than the uniformity that the OP here would introduce.

Michael, am I understanding correctly that your counterproposal to this issue would essentially be very close to the status quo:

(where I think the fourth bullet is the only one that doesn't match the status quo?).

The int case makes me more open to this than I'd like to be (or admit.

mppf commented 1 year ago

Michael, am I understanding correctly that your counterproposal to this issue would essentially be very close to the status quo...

Well... actually I wasn't trying to make a counter-proposal as much as ask a question to clarify what you were proposing, since this aspect seemed like a relatively big change.

What do I think we should do?

Actually I like the proposal in this issue better than the status quo. I think that the language design better if (supposing range is a fully defaulted generic type) range is the generic version and range() is the concrete version with the defaults. This seems intuitive to me and consistent with the other decisions. Somehow the language design around fully-defaulted generics has never sat well with me; e.g. #21075. However, this adjustment addresses my concerns.

Specifically, going through some concerns:

Also, I have one more thought to bring up. I think that the generics-with-default status quo makes it relatively easy to add genericity to a record (as long as it has defaults) without changing the API. However, I think that such things are better handled by introducing new types; and I would prefer the clarity that the proposal in this issue offers, to this code-update benefit.

bradcray commented 1 year ago

Thanks for the detailed comment, Michael. Checking on something that was unclear in your response:

Concern: when reading code, it's harder to tell if a type is generic or not Addressed: you're likely to see things like range(), which implies that range is generic -- and moreover that it is generic with defaults. (it implies it is generic if we ban type expressions like concreteRecord() and I think we should)

Just to make sure we're understanding each other:

In what I was proposing, I agree that seeing var r: R(...) would indicate that R was generic if it was a record or class(*); but I wasn't proposing we remove the ability to write var r: R; for generic records/classes R. So using such a declaration style, it still wouldn't be clear whether R was generic or not.

Were you saying you were on-board with that, but still liked it better than the status quo since it would at least help disambiguate the cases that do use parens? Or were you thinking we'd go further and stop permitting var r: R; to be written for a generic type R? (i.e., the user must use a parenthesized type constraint for such types).

* = though if we knew nothing about R, this could also be a procedure call, an indexing expression into a tuple type, ...

bradcray commented 1 year ago

Inverting a previous comment: This issue's seeming result of disabling the ability for an end-user to create an int-like record type (which is to say, one that is generic and parameterized, yet which also supports a default no-parens form) feels like a major downer to me.

That led to the following stray thought: Could we proceed with this issue's proposal, yet without disabling that ability (someday / eventually) by:

Such that a user could write:

record myInt {
  param width: int;

  proc type init(param width: int) {
    if !legalWidth(width) then compilerError("illegal width passed in");
    this.bitWidth = width;
  }

  proc type init {
    this.bitWidth = 64;
  }
}

In the typical procedure overload context, we can't really support paren-ful and paren-less overloads of a single routine because the results of procedure calls can have additional access/call expressions applied to them to make use of their returned values, making it ambiguous where each call starts and stops.

But in an expression starting with myInt, once we've resolved that symbol to a record type like this, we know that we're essentially invoking a type constructor, and that type constructors don't return anything. So if the expression were simply myInt, we'd call the paren-less type constructor, and if it were myInt(32), we'd call the 1-argument type constructor.

mppf commented 1 year ago

In what I was proposing, I agree that seeing var r: R(...) would indicate that R was generic if it was a record or class(*); but I wasn't proposing we remove the ability to write var r: R; for generic records/classes R. So using such a declaration style, it still wouldn't be clear whether R was generic or not.

Were you saying you were on-board with that, but still liked it better than the status quo since it would at least help disambiguate the cases that do use parens? Or were you thinking we'd go further and stop permitting var r: R; to be written for a generic type R? (i.e., the user must use a parenthesized type constraint for such types).

Yes, it's my view that range() being defaulted but range being generic helps the situation. It's true that it doesn't make it clear to the reader if, in something like var x: myRecord, myRecord is generic.

That led to the following stray thought: Could we proceed with this issue's proposal, yet without disabling that ability (someday / eventually) by ...

The parenless type constructor idea is interesting.

But in an expression starting with myInt, once we've resolved that symbol to a record type like this, we know that we're essentially invoking a type constructor, and that type constructors don't return anything. So if the expression were simply myInt, we'd call the paren-less type constructor, and if it were myInt(32), we'd call the 1-argument type constructor.

Well, if you can write a proc type this, there can still be ambiguity. However, I think it's reasonable for the language to just have a preferred interpretation in cases like that. (In the same way that we rely on operator precedence).

bradcray commented 1 year ago

In talking to users last week, a preference was stated for having R continue to be supported for fully-defaulted generics as meaning R() and to potentially stop supporting R for generics that are not fully-defaulted, requiring it to be written as R(?) instead.

lydia-duncan commented 1 year ago

@mppf - I think your work with making genericity more explicit could be considered as resolving this issue. Does that sound right?

mppf commented 1 year ago

I think we could close it

For this program from Case 1:

record RPDG {  // a generic record with partial defaults
  param p = 1;
  type t;
  var x: p*t;
}

var rpdg1: RPDG = new RPDG(2, real);  // works even though RPDG.p's default is not 2
var rpdg2: RPDG() = new RPDG(3, string);  // ditto

I didn't really resolve the inconsistency between RPDG() not being the same as RPDG(p=1). However, the pattern above now gives warnings & unstable warnings. I think this is sufficient to resolve this issue but I am worried that there is a case I am missing that the existing warnings won't cover.

For the program from Case 2, I consider it resolved; it now gives a warning that says it will become an error in the future.