chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.8k stars 421 forks source link

proposal for limited user-defined implicit conversions #16729

Open mppf opened 3 years ago

mppf commented 3 years ago

Related to #5054, #14213, #15838, #16576.

Background

We have been discussing to what extent user-defined implicit conversions are important for the language to support. We already have the ability to support copy-initialization and assignment between different types (with init= and = functions). In #16576, we asked if init= should cause implicit conversions to occur in the process of choosing which function is called. At the same time, we are expecting not to allow potential cycles among implicit conversions. Since many types need to support = and init= in a way that creates cycles (e.g. assigning an array to a slice is OK and assigning a slice to an array is OK) that leads us away from using init= to enable implicit conversions.

Separately, some Chapel types have the property that storing them into a variable produces a different type. For example, var B = A[1..10] (i.e. creating a new variable storing a slice) will create a non-slice copy. This kind of pattern is useful for "view" structures. Issue #14213 has proposed that we enable a user-facing way to get this behavior for user-defined records.

Are user-defined implicit conversions needed?

My understanding of the experience of developers with C++ and Rust is that user-defined implicit conversions are sometimes really important but should be used sparingly. From a language design point of view, that means that they should be supported (but potentially in a limited capacity).

In Rust, the implicit conversions can be used with .into. This syntax indicates that an implicit conversion can occur but does not indicate the destination type for the conversion. It does work for function calls. (Yes, these are arguably explicit conversions, but they are not like Chapel's explicit conversions (aka casts) because they do not necessarily indicate the destination type).

See https://github.com/chapel-lang/chapel/issues/5054#issuecomment-726873643 for more details on these.

See also https://github.com/chapel-lang/chapel/issues/5054#issuecomment-730457232 for a discussion of pros and cons of implicit conversions in any form.

What use cases do implicit conversions need to support?

I surveyed use cases of implicit conversions that I am aware of in https://github.com/chapel-lang/chapel/issues/5054#issuecomment-728339302 . Some important questions I had in mind for that survey:

From that survey, I think that the most important cases to support are:

Proposal

So, here is a specific proposal. A record author can define two methods to indicate what implicit conversions are available.

  1. proc R.inferVariableType() type returns the type of x in var x = some-expression-of-type-R. This supports #14213. For a case like var x = some-expression-of-type-R the compiler will seek to use the existing init=/= mechanisms to initialize the new variable (see also #15838). Implicit conversions are allowed from R to the type returned by this method and will be implemented within the compiler by converting the implicit conversion into an explicit conversion (i.e. a cast operation).
  2. proc R.canImplicitlyConvertTo(type t) param returns true if this can convert into t and false otherwise. When this returns true, the compiler will convert the implicit conversion into an explicit conversion (i.e. a cast operation) which can in many cases be implemented by init=/= (but see #16732 for that last point). This is intended to support implicit conversions within a type (e.g. between range of different integer widths or between unit numbers with different units).

Part of this proposal is that it will not be possible to define these methods outside of the module defining the type (because these implicit conversions are considered part of a type rather something that can be bolted on later). Further, the compiler would detect cycles in these conversions and issue an error in that event.

We might want it to also only allow 1 such user-defined implicit conversion in an implicit conversion sequence. We might want to require that, if a type defines R.inferVariableType , thenR.canImplicitlyConvertTo exists and returns true for the result of R.inferVariableType. We might want to require that an instantiated type can only convert to one category of types (e.g. numbers or strings, but not both).

This proposal covers the use cases that I am aware of with the exception of int to bigint implicit conversions and range to domain implicit conversions. Implicit conversions from int to bigint do not seem to be required for bigint and many of the expected patterns can be supported by = and init= between the types as well as operators across the different types e.g. proc +( a: bigint, b: int). I don't think that implicit conversions from range to domain should be in the language - I think these might be surprising and don't seem necessary to me.

mppf commented 3 years ago

In the first version of this proposal, I had this:

proc R.coerceToType(type t) returns the result of converting this to type t and uses a where clause / argument declarations like type t: int to limit the destination types allowed. This supports the remaining cases - most importantly coercions within a type (which is necessary for a unit library, or for coercions between range values of different integer widths).

I changed from this to proc R.canImplicitlyConvertTo(type t) param that returns true/false. Why?

bradcray commented 3 years ago

Since many types need to support = and init= in a way that creates cycles (e.g. assigning an array to a slice is OK and assigning a slice to an array is OK) that leads us away from using init= to enable implicit conversions.

Can you say more about this? (or if it was discussed in a previous comment that I haven't worked my way back to yet, point me to the argument?). Specifically, I'm not seeing the link between why having init= result in implicit conversions creates problems for this case (assuming we don't permit multi-hop coercions and prefer no-coerce function signature matches over ones requiring coercions). I'm also curious whether there are other motivating cases that involve cycles since I tend to view assignments between slices and arrays more like assigning between different array implementations (e.g., block gets cyclic) than I do between distinct types I.e., I wouldn't abandon an otherwise good idea if arrays and slices were the only challenge (not that I'm trying to dig my heels in against this proposal in paragraph one, just trying to understand the argument).

The lack of int->bigint coercions seems unfortunate to me in that (a) it seems natural and reasonably safe, and (b) defining 3 overloads of each binary operator (bigint + bigint, bigint + int, int + bigint x each operator) seems like annoying busywork for the type author and a place to make dumb mistakes, given that each of the second two will presumably create a bigint from an int and pass it to the original (or else copy-paste its body).

And then, just to make sure I'm understanding, if we were defining Chapel's built-in types as user-level code, we wouldn't be able to define the bool->int, int->real, real->complex, imag->complex coercions that we have today, is that right? (these seem equivalent to the int->bigint case, unless I'm missing something?).

mppf commented 3 years ago

@bradcray - thanks for your thoughts!

Since many types need to support = and init= in a way that creates cycles

Can you say more about this? (or if it was discussed in a previous comment that I haven't worked my way back to yet, point me to the argument?).

Sure. I brought it up here and have included the relevant portion below:

Additionally we have been considering whether or not init= could serve as the way to enable implicit conversions - so that MyType.init=(arg: OtherType) means that OtherType can be implicitly converted to MyType. At the same time, several discussions of implicit conversions have requested not allowing cycles in implicit conversions.

I'm thinking that the init= approach is not really tenable if we don't allow cycles.

Consider a Matrix record containing an array. It would make sense for this Matrix type to allow assignment from arrays of compatible shape. Additionally, it would make sense to allow assignment to arrays with compatible shape from the matrix type:

  var m: MyMatrix = ...;
  var a:[m.domain] real = ...;

  m = a; // sure, this makes sense, it's just setting the matrix elements
  a = m; // sure, this also makes sense, it's just setting the array elements

Now since we have split-init, it's easy to create cases similar to the above that will convert default-init and assignment to a call to init=.

  var m: MyMatrix;
  m = someArray; // split-init -- converts to MyMatrix.init=(someArray)

  var a:[D] real;
  a = m; // split-init -- converts (notionally) to array.init=(m)

As a result, the natural pattern of legal = for the Matrix type would imply that we need init= in both directions, which would lead to a cycle in implicit conversions if init= implies implicit conversions.

So, instead, I think we need to have a separate way to opt in to implicit conversions.

This is also connected to the question of when a mismatch between mixed type = and init= is an error (#15838). I have been supposing that either it is an error (in which case the init= needs to be provided) or that the compiler will infer the init= from the =.

Another example is with array slices. For an array slice we allow assignment between slices and other arrays (in either direction). If the corresponding init= is inferred or required, we have a cycle.

An alternative example is with a variant of bigint and int - supposing for the sake of this discussion that it is OK to assign to an int from a bigint as long as the bigint is (at runtime) small enough to fit. (Otherwise, it throws, or something). Then we would allow myBigint = myInt and myInt = myBigint and some of these patterns would convert to init= with split-init.

mppf commented 3 years ago

And then, just to make sure I'm understanding, if we were defining Chapel's built-in types as user-level code, we wouldn't be able to define the bool->int, int->real, real->complex, imag->complex coercions that we have today, is that right? (these seem equivalent to the int->bigint case, unless I'm missing something?).

We could define these. It's just that the type being implicitly converted from is the one that creates the method. E.g. proc bool.canImplicitlyConvertTo(type t) param { return t == int; } would allow for conversions from bool to int.

The lack of int->bigint coercions seems unfortunate to me in that (a) it seems natural and reasonably safe, and (b) defining 3 overloads of each binary operator (bigint + bigint, bigint + int, int + bigint x each operator) seems like annoying busywork for the type author and a place to make dumb mistakes, given that each of the second two will presumably create a bigint from an int and pass it to the original (or else copy-paste its body).

While I focused in this proposal on other use cases, it does not mean that int->bigint coercions are not possible. In particular, since bigint is part of the standard library, we could define proc int.canImplicitlyConvertTo(type t) param { return t == bigint; } (say). But that would not help if a user created their own type similar to bigint.

I think we could consider allowing both proc R.canImplicitlyConvertTo and proc R.canImplicitlyConvertFrom but that seems to have relatively less support and motivation and is something we could add later if it turns out to be necessary.

The issue here really is this - if somebody wants implicit conversions from type A to type B, does type A or type B need to be modified to indicate it? Here I am saying that I think that modifying type A makes more sense for the biggest use cases I know of where implicit conversion is essential. The cases like int -> bigint where would want to modify type B are in contrast more of a convenience IMO. (This happens to be the opposite of the situation we would get if init= enabled implicit conversions but I arrived at it by considering use cases ).

An even more conservative strategy would be that both type A and type B would have to opt in to the implicit conversion.

mppf commented 3 years ago

I wonder if it would make sense to limit the number of types a type can be implicitly converted to. So I added this:

We might want to require that an instantiated type can only convert to one category of types (e.g. numbers or strings, but not both).

In the use cases there is just 1 target type for the implicit conversion after instantiation, e.g. range(int(8)) -> range(int) - the implicit conversion here is to range. Or in CharProxy -> Char the implicit conversion is to Char (or to numeric types more generally if Char is just int(8)).

bradcray commented 3 years ago

in a way that creates cycles

Can you say more about this? (or if it was discussed in a previous comment that I haven't worked my way back to yet, point me to the argument?).

Sure. I brought it up here

Regarding cycles: I understand how they can come about and exist (though I still feel like we're lacking really compelling cases for them); but I'm not clear on how/why they would become inherently problematic.

But that would not help if a user created their own type similar to bigint.

Right. That seems important to me. Or, rather, I think we should try to create a system in which users can do whatever we need to within the built-in and standard types (to avoid special-cases; to create a language that's as rich and extensible as possible rather than having some features that enjoy special privileges).

I wonder if it would make sense to limit the number of types a type can be implicitly converted to.

I sometimes wonder about this too, but then become skeptical. Specifically, it seems appropriate and useful that int(8) would coerce to int, uint, real, maybe complex, bigint, fixed, and potentially many other types that a user might define without feeling particularly dangerous.

However, I don't have a similar example in which a given user-defined type would have a large number of types that it could convert into. It seems like the number of types is most often 0 or 1. Or am I overlooking something at the moment in saying that?

mppf commented 3 years ago

Regarding cycles: I understand how they can come about and exist (though I still feel like we're lacking really compelling cases for them); but I'm not clear on how/why they would become inherently problematic.

Right. We are talking about that on https://github.com/chapel-lang/chapel/issues/5054#issuecomment-734005786 and there I brought up the notion that (depending on the implementation of implicit conversions) cycles could lead to infinite loops within the compiler. But that could arguably be addressed with compiler engineering. I don't currently know of a worse fundamental problem with them.

However, I don't have a similar example in which a given user-defined type would have a large number of types that it could convert into. It seems like the number of types is most often 0 or 1. Or am I overlooking something at the moment in saying that?

If we had a type wrapping that implicitly converts to int(8) we might want it to convert to all of those other types. From https://github.com/chapel-lang/chapel/issues/5054#issuecomment-728339302 CharProxy -> Char and sync t -> t could both run into this.

I think this is connected to the question of whether conversions can be chained (or they are "one hop" only). In https://github.com/chapel-lang/chapel/issues/5054#issuecomment-735920261 I brought up the possibility of BigintSumExprType converting to bigint and bigrational (bigrational doesn't exist today but I am supposing it might be added).

bradcray commented 3 years ago

cycles could lead to infinite loops within the compiler.

I didn't remember that concern (and couldn't find it again searching on "infinite", "loop", or "compiler"), but I'm not seeing how cycles would inherently lead to infinite loops unless we just did a poor job with implementing them. E.g., I don't think the bool cycles we have today have led to infinite loops or other concerns. I thought this was because we only supported single-hop coercions, though something you/someone said recently made me wonder if I was wrong about that. Though this test suggests we don't:

proc foo(x: real) {
  writeln(x);
}

// foo(true);  // doesn't work, as hoped

bar(true);  // works because I've written the code to do the two hops

proc bar(x: int) {
  baz(x);
}

proc baz(x: real) {
  writeln(x);
}

If we had a type wrapping that implicitly converts to int(8) we might want it to convert to all of those other types.

That's a good point.

I think this is connected to the question of whether conversions can be chained (or they are "one hop" only).

Probably obvious by now, but I'm definitely coming from the "one hop only but cycles are OK" camp.

mppf commented 3 years ago

Something occurred to me that relates to the question of whether or not to allow cycles in implicit conversions.

We have several elements of the compiler that use whether two types are implicitly convertible as a way of creating an ordering of sort. These features will run into complications for types that allow implicit conversions both ways.

Consider for example types A and B where we suppose that both directions of implicit conversion are allowed.

First, there is the return type inference:

proc f() {
  if someCondition {
    return somethingOfTypeA();
  } else {
    return somethingOfTypeB();
  }

Now, the regular return type inference would choose the return type that the other returns are convertible to. But in this case it could choose either. So what is the return type? Would we always choose the first appearing syntactic return? Or make this a compilation error?

A second example is function disambiguation. When implicit conversions are required for a call, one of the disambiguation rules prefers a function with a "more specific" argument which means that the more specific formal can implicitly convert to the less specific one.

For example, suppose that, besides implicit conversions both ways between A and B we also have implicit conversions from int to A and also from int to B. Then:

proc g(in arg: A) { }
proc g(in arg: B) { }

g(1); // Should it do implicit conversion from int to A ? or int to B ? Or error?

In this example, g could resolve if we had implicit conversions in only one direction between A and B.

I don't view these as show-stoppers for allowing cycles but I hope that they do point out that we have other features that cycles might interact with in a surprising manner.

bradcray commented 3 years ago

So what is the return type? Would we always choose the first appearing syntactic return? Or make this a compilation error?

I'd say it should be an error, much like how returning string on one path and bool on another would today.

Should it do implicit conversion from int to A ? or int to B ? Or error?

This seems like it should be an ambiguity error to me.

bradcray commented 3 years ago

Here's a similar case that looks like it is accepted by the compiler today, but should arguably lead to a similar ambiguity error (where it seems like the behavior is actually "take the first return type and then see whether additional ones can coerce to it"?):

config const coinflip = true;

proc foo() {
  if coinflip then
    return true:bool(8);
  else
    return true:bool(16);
}

writeln(foo().type:string);
mppf commented 2 years ago

Regarding cycles in implicit convertibility, this post has some thoughts about why that might not be a good idea -- https://github.com/chapel-lang/chapel/issues/19167#issuecomment-1030007035 . However I am uncertain how to translate the issues brought up there to our particular system - so maybe it can be summarized as "cycles in implicit convertibility create problems for some disambiguation strategies which need transitive implicit convertibility".