User-defined Coercion and Cast Syntax

mppf commented 7 years ago

Chapel currently supports user-defined casts but not coercions. However, the way to declare a user-defined cast is undocumented and odd. Here is an example:

proc _cast(type toType, src) where isSomeType(toType) {
     ...
}

It would be nice to have a more user-facing feature here.

One proposal for the way to declare a user-defined cast is like this:

proc sourceType.cast(type toType) {
     ...
}

and then the way to indicate automatic coercion was possible between sourceType and toType is like so:

proc sourceType.coercible(type toType) param {
     return true;
}

Some sample use-cases include:

Matrix type
splitting out associative-array-as-set / array-as-vector into separate types
possibly for record wrappers in delete-free story (issue #5096)
coercing from strideable=false to strideable=true ranges

Design questions

updated with conclusions from discussion below:

Should coercions be handled by something indicating if coercion is possible, and then a call to the cast function if so?

Yes

TODO based upon discussion below

Should the cast be declared as a method or as a function? Or as a type method?

Type method
- Function-approach allows covering multiple source types without syntax changes
- Function-approach provides symmetry in how the coercible types are specified.
- Method-approach does not pollute the global namespace
1. What are the exact names of the 'can coerce' and 'cast' functions/methods?
isCoercible and cast

TODO based upon discussion below

Can user-defined coercions be applied to method receivers?

Yes
1. Can user-defined coercions returning a ref apply to ref arguments?
Yes
1. Can user-defined coercions returning a value or const ref apply to in or const ref arguments?
Yes

mppf commented 7 years ago

PR #1868 is pretty old at this point but was a start at implementing the proposal described above.

mppf commented 7 years ago

Note that C++ allows coercion to be specified by writing a constructor or by writing a special function. I read somewhere that Scala allows methods on either the source or destination type (e.g. from_sourcetype and to_desttype methods) but so far I have only tried it with a function. But, scala uses an 'implicit' keyword for these and doesn't require them to have a particular name.

mppf commented 7 years ago

Arguably, specifying an implicit coercion from type A to type B makes type A a subtype of type B. We should consider updating PRIM_IS_SUBTYPE to handle this. PRIM_IS_SUBTYPE is generated in where clauses using 'someType:someOtherType' syntax.

bradcray commented 7 years ago

1) I think so, though could be convinced that I'm wrong. I think of a coercion as being an automatically-applied cast. OTOH, if the coercion took the same signature of the cast, it could presumably just call/apply the cast rather than requiring code duplication... I guess my worry is that if the coercion is like the cast, but did something different, it'd be more confusing than useful. I'd at least start with the "canCoerce"-style boolean until someone provides a compelling example for coercions different from casts.

2) I think these should be methods, not standalone functions. I don't see how the cast could be a type method since it needs the value of 'this' to compute the new value, presumably. coercible probably should be a type method.

3) No strong opinion at this time. I don't find your proposals objectionable, though I could probably remember how to spell "canCoerce" more easily than "coercible" vs. "coercable"

Note that C++ allows coercion to be specified by writing a constructor or by writing a special function.

I generally don't like the use of constructors as the mechanism for defining coercions/casts. Too subtle. IIRC, there's an example in C++ where if I were defining a vector class whose constructor took an 'int' to define the allocation size, it would also cause ints to coerce to vectors (which wasn't my intention)?

mppf commented 7 years ago

Yes the C++ design of making constructors with 1 argument add implicit conversions isn't something I'd like to emulate either. However, it might make sense to implement the cast that way. After all, a type conversion from some other object is creating a new object, isn't it?

The main concern I have with it being a method is that it means that casts/coercions of class instances can be virtual method calls/single dispatch. I think that will work out OK, but it seems a little different (at least from C++), and so I'm a bit nervous about it. I think it's likely that Scala and C# allow such a thing but I havn't dug in to it.

If we have someType.cast(toType), each actual toType will create a different method for someType and its dispatch children. That means that a parent type could have a generic cast method to enable subclasses to override it (or just support casts to specific types). Other than that, working from the static type would work as expected.

class Number {
  proc cast(type t) {
    var ret:t;
    writeln("unsupported Number cast");
    return ret;
  }
}
class Integer : Number {
  var x:int;
  proc cast(type t) where t == int {
    return x;
  }
}

var n:Number = new Integer(1);
writeln(n.cast(int)); // runs Integer.cast; returns 1
writeln(n.cast(real)); // runs Number.cast; returns 0.0

Note that in Java, toString() is an important method to be dispatched and overridden. Since in Chapel, the equivalent is to cast something to a string, we'd be matching the Java model for that case if the cast-to-string method can be overridden.

bradcray commented 7 years ago

I'm not sure whether the following is what you're suggesting, but I'd feel disinclined to have casts be simply another 'init' overload for the same reasons as my C++ example. That is, if I did:

class vector {
   var size: int;
   type t;
   var buff: [1..size] t;
   proc init(s: int) {
      ... set up an s-element vector ...
   }
}

I wouldn't want the following to suddenly work:

3:vector

I could imagine creating a special 'initCast' initializer for this purpose, though.

I hadn't thought of this so explicitly before, but I'm thinking that class variables should probably not be able to support cast/coercion methods because it would essentially override the inherent language support for moving between subclasses and superclasses in a way that seems confusing. I think this hunch also relates to the fact that variables of 'class' type are references to objects, not the objects themselves. If this hunch is correct, I think that would wash away the virtual dispatch concern?

mppf commented 7 years ago

I think that fact that myObject:someClassType does a dynamic cast is an interesting point. We wouldn't want to interfere with that / 'isa'.

mppf commented 7 years ago

What are the limitations of coercing method receivers? Or coercing to const ref or ref formals?

If the result of coercion is a value, we don't want to coerce into a const ref or ref formal argument because we'd lose the element identity and that would be surprising
If the result of coercion is a ref, it should be passable into a in, const ref or ref formal argument
If the result of coercion is a const ref, it should be passable into an in or const ref formal argument

[ ] TODO: figure out why prototype branch doesn't coerce generic record method receivers; it's probably either something that was carefully considered (and is in the commit log) or just a bug

mppf commented 7 years ago

Example use cases:

Matrix type
splitting out associative-array-as-set / array-as-vector into separate types
possibly for record wrappers in delete-free story (issue #5096)
coercing from strideable=false to strideable=true ranges

ben-albrecht commented 7 years ago

To elaborate on what a Matrix type might look like with this approach (making some assumptions about the aforementioned questions):


record Matrix {

  const D: domain(2);
  var A: [D] real;

  proc init(X: [?Dom] real) {
    D = Dom;
    A = X;
  }
}

proc Matrix.coercible(type t) param where t == A.type return true;

proc Matrix.cast(type t) ref where t == A.type {
    return this.A;
}

var Dom = {1..10, 1..10};
var Arr: [Dom] real;
var Mat = new Matrix(Arr);

writeln(Mat[4, 3]);
writeln(Mat.size);

vasslitvinov commented 7 years ago

I agree that we do not want to allow integers to be implicitly convertible to vectors as discussed above.

So what happens in the following example?

proc useIt(in formal: A) {...}
var actual: B;
useIt(actual);

Is actual (attempted to be) coerced to A, then formal is initialized from the result of the coercion?

vasslitvinov commented 7 years ago

To make sure we are clear:

"cast" == "explicit conversion"
"coercion" == "implicit conversion"

I prefer coercions and casts to be user-defined via functions, not methods. This is because there are cases when we want multiple source types to convert to the same target type following the same algorithm. For example, most types convert to strings by writing into a string buffer.

To tell whether T1 is convertible to T2, it is more user-friendly if the compiler simply attempts to resolve the corresponding conversion function (or method). Failure to resolve means "can't convert". If we can't make this happen for some reason, I am OK with requiring a boolean function like coercible.

We should not allow conversions for ref formals, in general. That's because it introduces a subtle difference in behavior.

We can allow conversions for const ref formals, by the same principle as allowing to pass r-values by const ref.

I suggest treating this formals like other formals w.r.t. conversions.

W.r.t. the Matrix example above I agree that Mat[4,3] should redirect to Mat.A[4,3]. For reading an element we could redirect using a coercion. Doing so for writing an element violates the rules I suggest above. I don't think this example is a good motivation to change the rules. Instead, Matrix should use delegation or define explicit this method(s).

mppf commented 7 years ago

@vasslitvinov - answering your first question (with useIt in the previous comment) - yes.

Regarding the second comment, the terminology seems correct to me. I think you raise 3 questions:

1) Should cast be a function rather than a method? 2) Should we have a 'coerce' function/method instead of 'isCoercible' and 'cast'? 3) Should we ever allow user-defined coercions to apply for ref formals?

About the 3rd question: I think it's important that Matrix be able to coerce to a ref array formal. For example:

var x = new Matrix({1..2, 1..2})
setToOne(x);
proc setToOne( ref A:[] ) {
  A = 1;
}

I think a Matrix should be passable to a ref actual expecting an array. Otherwise Ben will have to re-write many methods to make them apply to Matrix type. I cannot think of any type-theory reason why that is not reasonable, as long as the user-defined coercion specifies that the result of coercion to array is a ref array (rather than a value, say). Can you say more about the subtle difference in behavior you are worried about?

mppf commented 7 years ago

How is user-defined coercion different from delegation?

user-defined coercion can apply to function arguments, but delegation only coerces method receivers
delegation seems conceptually easier to understand
delegation can easily filter the methods that are delegated but it's harder to imagine doing that with coercion
delegation is private, but coercion is public. In particular, coercion makes one type act like another type. Delegation just lets one type use some other type in its implementation. In other words - if one type was coercible to another type, removing that coercion can break software using that type. Vs with delegation, where removing delegation from a type can be done without breaking anything (supposing that the relevant methods are declared).

Aside from conceptual difficulty & filtering, we could imagine having support for coercion that only coerced method receivers. That would allow it to solve much of the same problems as delegation.

However, my personal view is that both features are useful. It's just that current motivators (like Matrix) seem to fit coercion more than delegation.

vasslitvinov commented 7 years ago

@mppf - thank you for the example with setToOne(). I now agree to allow ref coercions.

(I can explain what I was thinking before. I don't think that way now.)

One important thing that occurs to me is this.

In the Matrix example, a ref coercion from Matrix to array must take the Matrix argument (this or otherwise) by ref. If the actual is a const ref or an r-value, coercion cannot return a ref.

I think this follows up automatically from our current rules. (How well they are enforced that's a separate question.) In particular, fields of a record formal with const ref or const in intent must also be const, regardless of the field's type. So we can't return them by ref.

The only practical effect of this that I see in the Matrix example is that the expression new Matrix(...) or a const variable or formal cannot be coericed to a ref array.

Speaking of which, where do we stand on passing r-value actuals, ex. literals, to const ref formals?

mppf commented 7 years ago

Re Matrix, Ben's example doesn't have it, but the cast method would need to be a ref-pair for it to work correctly:

proc const Matrix.cast(type t) const ref where t == A.type {
    return this.A;
}
proc ref Matrix.cast(type t) ref where t == A.type {
    return this.A;
}

Re. passing r-value actuals to const ref formals - it seems today to be OK for records (which is the only way I can think it would be relevant for this discussion) but not for integer literals (or probably other literals):

proc f(const ref arg) {
  writeln(arg);
}
f(1); // error: non-lvalue actual is passed to 'const ref' formal 'arg' of f()
record R {
  var x:int;
}
f(new R(1)); // works now

edit: and I don't see anything wrong with passing record rvalues to const ref formals.

vasslitvinov commented 7 years ago

We should introduce syntax to avoid duplication for ref pairs, e.g.

proc const(?c) ref Matrix.cast(type t) const(c) ref where t == A.type {
    return this.A;
}

where c is a boolean param resulting in two instantiations, with true and false.

mppf commented 7 years ago

Let's keep this discussion focused on user-defined coercions.

mppf commented 7 years ago

@vasslitvinov - I'd like to argue for the design described in the issue description for these two questions:

a) Should cast be a function rather than a method?

Using a method seems to me to add less noise to the global namespace. In particular, for a method, if a call to a cast function does not resolve, we can generate a simple error. With a function, we might list all 200-some candidates in the standard modules. We presumably don't need to list those for methods. (Or at least not unless the user asks for more information about possible candidates).

You wrote:

I prefer coercions and casts to be user-defined via functions, not methods. This is because there are cases when we want multiple source types to convert to the same target type following the same algorithm. For example, most types convert to strings by writing into a string buffer.

I'd argue that it would be interesting to allow one to declare a method on a generic receiver. Something like this:

   proc ?t.cast(type toType) where shouldHandleType(t) { ... }

While we don't have that now, I think it could be added and would make sense for other reasons.

b) Should we have a 'coerce' function/method instead of 'isCoercible' and 'cast'?

You wrote:

To tell whether T1 is convertible to T2, it is more user-friendly if the compiler simply attempts to resolve the corresponding conversion function (or method). Failure to resolve means "can't convert". If we can't make this happen for some reason, I am OK with requiring a boolean function like coercible.

It's important to keep in mind that some types will support explicit conversion but not implicit conversion. So if we don't have isCoercible and cast, we'll have coerce and cast.

The main argument in favor of the isCoercible + cast approach is that we almost certainly wouldn't want cast and coerce to do different things. Can you think of a case where having cast be different from an implicit coercion is actually useful? I think it would be confusing if they ever were different, when both are allowed.

Separately, I don't think it's a good idea to fail silently on a resolution error in this case. E.g. a coerce method might not resolve because of a minor type-o. In the isCoercible case, that would be a straightforward error.

3) Should we ever allow user-defined coercions to apply for ref formals?

I think we're in agreement about this one - they can be if the coercion returns a ref of the appropriate type and const-ness.

vasslitvinov commented 7 years ago

I am game for isCoercible + cast. An alternative that would not require code duplication would be coerce + cast, where coerce would be required for an implicit conversion and either would work for a cast.

As to silent resolution failures - sounds like we need isCastable, by the same argument?

I am unconvinced by the "noise in the global namespace" argument if cast is a function. It only shows imperfection of our error reporting, and I suspect we will currently experience that if cast is a method just as well. While I am open to be out-voted, I still think that the symmetry is a fairly strong argument in favor of the "function" approach.

I am also concerned that currently we declare the destination type using a where clause, like

proc _cast(type t, x: int(32)) where t == syserr .......

This is quite inconvenient IMO. However I see these pros:

this allows us to provide a single implementation for a set of related types, e.g.

proc _cast(type t, x: complex(?w)) where isRealType(t) || isIntegralType(t) ........
switching to "cast is a method on the from-value" does not solve it

We'd have to make cast a method on the destination type to address this. Or, introduce the syntax:

proc cast(type t==int(64), from:int(32)) .......

to make it user-friendlier.

mppf commented 7 years ago

As to silent resolution failures - sounds like we need isCastable, by the same argument?

No, because there's no optional behavior with cast; cast just is the function called on an explicit someThing:someType cast. If it doesn't resolve, there will be an error. Contrast with implicit conversions/coercions. There, the compiler will attempt to coerce one type into another during resolution, and has fall backs if it fails (such as using a different candidate function, possibly with a different coercion). So a failure in type-checking a coerce method could lead to a program appearing to work, but a failure in cast method could not.

Note though that if cast were a method on the destination type - it would possibly be an initializer (like in C++). Initializer or type method, this strategy has problems of its own, among them are that you can't write 1 implementation of a generic cast to a group of types (sortof the dual of the problem you brought up with cast being a method on the casted-from object).

I'm personally indifferent for whether cast is a method or a function, but I do think it's important to have isCoercible and cast in some form (possibly with different names). I'm not particularly concerned about needing a where clause.

vasslitvinov commented 7 years ago

Indeed we do not need isCastable, sorry for being unclear on that.

I am OK with the current where clause approach. Still hoping for a better syntax. :)

I am not much in favor of having cast be a method on the destination type because of the same symmetry argument I made against a method on the from-value. However, such cast would be distinct from an initializer (to avoid issues with int->vector coercion).

bradcray commented 7 years ago

One belated thought that occurs to me on this thread. We could potentially define casts using 'proc :(...)' rather than 'proc cast'. I see tradeoffs here:

pro: more like an operator overload pro: doesn't require reserving another identifier con: syntactically more subtle con: isCoercible doesn't have an obvious symbolic form, so maybe using keywords for both is preferable? con: will likely render as an emoticon in some scenarios

mppf commented 7 years ago

I started looking at the previous implementation a little bit and remembered this cast, in ChapelIO.chpl:

  proc _cast(type t, x) where t == string && ! isPrimitiveType(x.type) {
    return stringify(x);
  }

which I implemented last time as value.cast(), where value refers to dtValue which means any Chapel type. It seems we will need to implement cast as a function or have some way of declaring methods on a generic receiver, like ?t.cast.

mppf commented 7 years ago

Building on Brad's proc : idea and taking inspiration from Scala, we could make it be:

// declares support for an explicit cast only
proc :(from: myType, type destinationType) { ... }

// declares support for an explicit cast and an implicit coercion
proc implicit :(from: myType, type destinationType) { ... }

Yes, Scala literally has an implicit keyword.

If we wanted to, we could name the methods implicit: and explicit: as an alternative to introducing implicit as a keyword.

bradcray commented 7 years ago

which I implemented last time as value.cast(), where value refers to dtValue which means any Chapel type.

Can you clarify what about this doesn't / didn't work? If we really want cast to be a method, I wouldn't want this to be the reason to not make it be one.

bradcray commented 7 years ago

Building on Brad's proc : idea and taking inspiration from Scala,

This suggests separate cast and coerce methods, where I thought we were leaning more towards a single conversion method for casting and isCoercible to indicate whether it could be applied implicitly as well. Have you changed your opinion on this? Does Scala have compelling cases of making them have distinct behavior?

I don't think we should introduce an implicit keyword if you were suggesting that (it's hard to tell).

I don't think 'proc explicit:(...)' is strictly less clear than simply 'proc cast(...)' or 'proc :(...)' (not to mention it looks like a beast to parse, particularly given our support for paren-less procedures, the use of ':' as the return type indicator, and the potential use of parenthesis to indicate a tuple return type.

ben-albrecht commented 7 years ago

I like the idea of implementing user-defined casts as an operation overload, proc :(from: myType, type destinationType) { ... }, for the reasons @bradcray pointed out earlier. Global namespace pollution is no longer an issue here either, since proc : would never collide with any user-defined functions.

For enabling coercion, the isCoercible param method still seems appealing, to ensure casts/coercions always have the same behavior.

This is pretty much what's implemented in your development branch right now, except it is using proc _cast( ... ) instead of proc :( ... ) (at the time of writing).

mppf commented 7 years ago

which I implemented last time as value.cast(), where value refers to dtValue which means any Chapel type.

Can you clarify what about this doesn't / didn't work? If we really want cast to be a method, I wouldn't want this to be the reason to not make it be one.

I worked, but it was strange. I just needed to mark dtValue with FLAG_GENERIC. It worries me a little bit because I don't think anything else has used dtValue for quite some time, and I wonder if enabling proc value.method() to apply to any type is bordering on extending the language (since I'm pretty sure it's not specified etc etc).

mppf commented 7 years ago

@ben-albrecht tried creating a Matrix type with a prototype branch and ran into problems with coercion for generic arguments. This seems to me to raise another design question. Should user-defined coercions be attempted when choosing a generic type to instantiate?

Consider this example, which hopefully makes the issue clear:

record Array {
  type t;
  var element:t;
}

record Matrix {
  type t;
  var array: Array(t);
}

// Set up Matrix(t) to be coercible to Array(t)
proc Matrix.coercible(type t) param where t:Array return true;
proc _cast(type t, m:Matrix) where t == Array(m.t) {
  return m.array;
}

proc acceptsGenericMatrix(x: Matrix) {
  writeln(x);
}
proc acceptsConcreteMatrix(x: Matrix(int)) {
  writeln(x);
}
proc acceptsGenericArray(x: Array) {
  writeln(x);
}
proc acceptsConcreteArray(x: Array(int)) {
  writeln(x);
}

var mat = new Matrix(int, new Array(int, 1));

acceptsConcreteMatrix(mat);
acceptsConcreteArray(mat);

acceptsGenericMatrix(mat);
acceptsGenericArray(mat); // does not resolve!

For what it's worth, C++ does not do this:

template <typename m>
class A
{
public:
    A(int) {} // implitly convert from int to A<m>
};
template<typename m>
A<m> id(const A<m>& arg)
{
    return arg;
}

int main() {
  A<float> a(4);
  A<float> x = id(a);
  A<float> y = id(44); // fails to resolve
  return 0;
}

If we don't allow such behavior, I'm having trouble seeing how Ben's Matrix type can be built (barring compiler support for "array-like" types).

Anyway, the answer to this question might suggest specific syntax for specifying one type can coerce to another. In particular, at the time that acceptsGenericArray (mat) is being resolved, we have:

an actual argument of a concrete type, Matrix(int)
a formal argument of generic type, Array(?) The compiler needs to answer the question: can I convert a Matrix(int) into an Array(?) ? This question isn't quite what the syntax above is answering, because Matrix.coercible works with concrete types (i.e. it's not possible to pass the type argument Array(?) to the t formal of coercible). Supposing that were to continue to be the case (since AFAIK generic instantiation only proceeds when concrete types for a generic are known), how would the compiler compute the type u in Array(u) that it should pass to the type t argument of coercible?

This all leads me to think that the best answer is to not have a coercible function/method at all. Instead, we could enable the language to include declarations that one (possibly generic) type is a subtype of another. As a strawman:

record Array {
 type t;
 var element:t;
}
record Matrix {
 var array;
  // Matrix init fn checks array.type is an Array of appropriate shape
}

Matrix subtype Array; // Note that both of these are generic types

Now the compiler knows:

Matrix is a subtype of Array
Matrix(t) is a subtype of Matrix

therefore

Matrix(t) is a subtype of Array

Now, for this to work in this example, I needed to remove the type t field from Matrix so that there is no possibility for that type to differ. Otherwise I'm not sure we could claim that Matrix is a subtype of Array (but perhaps somebody else can convince me that is workable).

Such a strategy should also be significantly faster than the coercible function strategy. In particular, the prototype branch is something like 30% slower to compile Hellos. With a subtype declaration, we can store this information in the AggregateType objects during scopeResolve and then function resolution merely needs to query it (instead of trying to resolve one or morecoercible calls in order to resolve a function call).

Thoughts?

edit: another syntax idea might be

type Matrix : type Array;

edit: we also have to consider whether or not ref Matrix can coerce into ref Array. Our previous plan had been to make this dependent on whether the cast function returns a ref - in which case it'd presumably be a compilation error if it didn't. A possible alternative would be to include this information in the subtype declaration.

bradcray commented 7 years ago

I[t] worked, but it was strange. I just needed to mark dtValue with FLAG_GENERIC

This seems reasonable to me. I do think that we should improve support for the 'value' type as the abstract base type of records (which is what I think it was designed to support). That is, I think of 'object::class' as 'value::record'. But I agree with you that it hasn't been exercised much so far. But I don't think that's a reason not to exercise it.

bradcray commented 7 years ago

If we don't allow such behavior, I'm having trouble seeing how Ben's Matrix type can be built (barring compiler support for "array-like" types).

I continue to think that domain maps could be a fallback strategy for getting Matrix types that act in a distinguished way from arrays. A factory function like:

  var Mat = new Matrix(1..n, 1..n);

could be used to insulate the typical user from the intricacies of using domain maps...

Instead, we could enable the language to include declarations that one (possibly generic) type is a subtype of another.

As a gut reaction, I'm open to pursuing a subtyping-based direction and (as I think you know) have wanted some notion of value subtypes for some time. I can't quite recall where we left our record inheritance conversation offhand, but syntactically, the first thing I snapped to was:

record Matrix: Array {
  ...
}

Though I realize this is arguably something different than what you're requesting, since it would (intuitively to me, at least) suggest that Matrix would inherit the fields/methods of 'Array' rather than being a distinct (and arbitrary) type that happens to have some sort of subtype property. (Or... would we want it to do that rather than explicitly storing an array?) It also has the downside that if we wanted to create 'age' as a subtype of 'int' (without requiring a record to be involved), we wouldn't have the right syntactic place to hang it (no 'record' keyword).

So if that approach didn't bear fruit, my next thought would be:

subtype Matrix: array,
        age: int;

I think we should take our time with this, though, and not rush into it without deliberation. If it helped with the 'age' use case (or 'index(D)' issue -- making it more than just a typedef), that'd carry a lot of weight in my book.

vasslitvinov commented 7 years ago

I think we need to distinguish between subtyping and instiation. For example, Matrix is a subtype of Array while Matrix(t) is an instantiation of Matrix.

I suggest that our subtype relationships be always between concrete types. In particular, the proposed declaration

subtype Matrix : Array

is merely a family of concrete-subtype relationships

forall t. subtype Matrix(t) : Array(t)

In general, subtype Matrix : Array is ambiguous. For example, does it mean the above (the same t for Matrix and Array) or the following (which could be correct or not):

forall t1, t2. subtype Matrix(t1) : Array(t2)

We could declare subtype Matrix : Array to be a shortcut for the former. It would be legal only if the set of generic arguments of Matrix and Array and their names match. E.g. it wouldn't work if we had record Matrix { var array; }

The syntax for the above forall would use Chapel's standard query types, e.g.:

subtype Matrix(?t) : Array(t);
// or
subtype Matrix(?t1) : Array(?t2);

vasslitvinov commented 7 years ago

The subtype declaration Michael proposes could be seen as a different syntax for proc cast(...). I do agree that subtype is a nicer syntax.

Separately, can we also use that on class types? I.e. introduce a subtyping/coercion behavior between two class types that are unrelated w.r.t. inheritance?

mppf commented 7 years ago

@vasslitvinov - re. the concrete vs generic types in subtyping - it seems to me that in order to be able to pass a Matrix(t) into a function expecting a generic Array (e.g. proc f(arg:Array)), that we need the compiler to be able to know that a Matrix(t) can be converted into an Array.

It seems unreasonable to ask the compiler to choose a possibly arbitrary instantiation, such as Array(t), and then decide that Matrix(t) can be converted into it.

This is the reason that I think we want to convey to the compiler that Matrix is a subtype of Array (and therefore that Matrix(t) can be converted into a generic Array in the process of instantiation).

I think that the language already blurs the lines between subtyping and instantiation. E.g. for generic classes Parent(t) and Child : Parent:

class Parent { type t; }
class Child : Parent { }

we can pass a Child(t) into:

a concrete argument of declared type Child(t)
a concrete argument of declared type Parent(t)
a generic argument of declared type Child
a generic argument of declared type Parent

all with the same syntax. I think it's an advantage of the language that we don't nit-pick about the difference between instantiation and inheritance. In fact, I'd posit that for a reasonable definition of sub-type - instantiation creates a subtype of the generic type.

Do you know of a reason that we should distinguish the two? (other than personal preference)?

Thanks.

vasslitvinov commented 7 years ago

@mppf The technical reason is that generic types are in a different domain than concrete types.

Ex. declaring that some T is a subtype of Parent or Array in the above examples is an incomplete statement. It does not tell us, for example, whether a T thing can be converted specifically to Parent(int).

While in the above example we can indeed pass a Child(int) actual to a generic Parent formal, we actually are passing a Child(int) to a Parent(int).

At the same time I can see that incorporating generic types into the subtype relationship helps us resolve the cases we want to be resolvable, such as acceptsGenericArray(mat) above. So maybe we can call that "extended subtyping", or call the other one "concrete subtyping", to be clear what we are talking about?

Thinking further, I can see the user writing simply

subtype Matrix : Array;

Then, to see which concrete Array is the supertype of a given concrete Matrix, we resolve an applicable cast function and check its result type. Is this what others have in mind?

lydia-duncan commented 7 years ago

Looking earlier in the thread for context, I'm a little concerned about defining a subtype relationship outside of an inheritance strategy. I think that would muddy our object oriented story a lot, and I view coercibility as a separate concept from the subtype relationship, though one that is definitely involved (in set terms, my gut says that all subtype relationships should be able to coerce to the supertype, but that not all types that are coercible require a subtype, supertype relationship).

On the current discussion, I see both sides. On the one hand, Child(int) is not a subtype of all instantiations for Parent. It isn't necessarily a subtype of Parent(int) - for instance, this is a valid initializer for Child:

class Child: Parent {
   type t2;
   var x: t2;

   proc init(type t2Val) {
      t2 = t2Val;
      super.init(real);
   }
}

On the other hand, if a function has specified that it takes the generic version of Parent, then providing Child with any t to fulfill that argument should be fine, as Child provides everything that Parent provides and more. It doesn't truly matter that what is happening under the covers is that the function is getting stamped out to an appropriate instantiation of Parent, what matters is that the instantiation of Parent is appropriate for the Child that is being passed in. So, following my example, if we sent in a Child(int) to a function that accepted the generic Parent proc foo (x: Parent), it would be fine that what is generated is this instantiation of foo proc foo (x: Parent(real)).

mppf commented 7 years ago

Responding to @lydia-duncan:

Looking earlier in the thread for context, I'm a little concerned about defining a subtype relationship outside of an inheritance strategy. I think that would muddy our object oriented story a lot, and I view coercibility as a separate concept from the subtype relationship, though one that is definitely involved (in set terms, my gut says that all subtype relationships should be able to coerce to the supertype, but that not all types that are coercible require a subtype, supertype relationship).

This is mainly an issue of terminology. Does coercibility create a subtype relationship? What is a subtype, anyway? I think reasonable people will probably have different answers to these questions. I'm not particularly wedded to the idea of using subtype as the keyword here. What is important to me is that we can express that one generic type is coercible to another generic type. I don't think that's possible with any straightforward isCoercible function or method because I don't see how a function can make a statement about generic types.

The definition of subtype in Wikipedia (which I assume is a reasonable source to understand consensus terminology) is

... a subtype is a datatype that is related to another datatype (the supertype) by some notion of substitutability, meaning that program elements, typically subroutines or functions, written to operate on elements of the supertype can also operate on elements of the subtype.

It's certainly the case that if type Matrix can coerce to Array, then subroutines written to operate on an Array can operate on a Matrix. After all, that's the whole point of these coercions. For this reason, I view "coercible" as the same as "subtype".

Responding to @vasslitvinov:

At the same time I can see that incorporating generic types into the subtype relationship helps us resolve the cases we want to be resolvable, such as acceptsGenericArray(mat) above. So maybe we can call that "extended subtyping", or call the other one "concrete subtyping", to be clear what we are talking about?

As with Lydia's objection, the issue here seems to me to have more to do with terminology than anything else.

Certainly, whatever : does in the language includes "extended subtyping". That is why you can write

proc acceptsMatrix( arg: Matrix )

where Matrix is a generic type. Shouldn't our working definition of "subtype" match what the : does in the language?

Thinking further, I can see the user writing simply

subtype Matrix : Array;

Then, to see which concrete Array is the supertype of a given concrete Matrix, we resolve an applicable cast function and check its result type. Is this what others have in mind?

This is exactly what I'm proposing. Also note that certain operations will be legal if the cast returns a ref that wouldn't be otherwise. Clearly, the cast function has a role in specifying in what way Matrix is a subtype of Array. What I'm looking for out of the subtype declaration (whatever it is called) is some indication to the compiler that it should attempt to implicitly add those casts.

vasslitvinov commented 7 years ago

@mppf OK, I can go with calling it just "subtyping". It's not a requirement for the term "subtyping" to match what ":" does, however it is surely a good convenience.

I still think that the compiler can infer the (extended) subtype relationship from cast() and canCoerce() functions. However it would probably be pretty complicated to do so, so I am OK with requiring separate subtype declarations.

Acknowledging that it is an extra burden on the programmer... maybe do the inference in simple cases automatically?

mppf commented 7 years ago

@vasslitvinov - I'm proposing that we use subtype declarations (whatever syntax we end up with) instead of canCoerce declarations. I don't understand why these would create extra burden on the programmer. Isn't the subtype declaration (straw-man) simpler and easier to write and understand than the canCoerce business?

vasslitvinov commented 7 years ago

Ah sorry I was thinking something else. Yes I like subtype instead of canCoerce.

mppf commented 7 years ago

I've been trying to prototype something to demonstrate a solution to Ben's example (but without worrying too much about the syntax yet). I've been trying for a strategy where:

the compiler knows in advance which types can be coerced to other types (as we could get with subtype declarations)
the the compiler adds casts to implement the coercion.

I had been thinking that for the acceptsGenericArray case, this would be sufficient. But, there is a similar problem to what I ran into with the original isCoercible.

The problem is that when resolving acceptsGenericArray, we need to know which concrete Array(x) to instantiate for a concrete Matrix(Array(y)) formal. I had been thinking we could resolve the cast call to do that - but all of our cast proposals require that the type being casted to is not generic (e.g. proc sourceType.cast(type toType) has a type argument, and the compiler doesn't support passing a generic type to a type argument.

I don't think there's a way around having some way for the user to specify the relationship between these type variables. How could we do that?

1) something like what @vasslitvinov described, e.g.:

    subtype Matrix(?t) : Array(t);
    // or
    subtype Matrix(?t1) : Array(?t2);

but the 2nd example there wouldn't be valid, because the compiler would try to instantiate acceptsGenericArray with an unknown/generic type, Array(t2).

Another variant on this would be:

  subtype Matrix : this.array;

but I'm not sure we can get away with using this in that context.

2) try to infer this information from cast declarations (without trying to resolve them). E.g. we could use proc Matrix.cast( type t: Array ) and ignore that : Array except for deciding that this is the cast function to resolve. (In a way, this is similar to the next option, but the compiler would have a special way to know the parent type in question).

3) Something likeproc Matrix.coercesToType() type return array.type. We could support just 1 subtype or adjust the compiler to tolerate ambiguities in these calls.

4) use a method generic on both arguments to get the concrete parent type. e.g. proc Matrix.getExactParentType( parent: Array ) return this.array.type; -- but note that such a function would require special handling in resolution, since we'll need to resolve it with parent having generic type. Another variant of this idea would be declaring proc coercible( src: Matrix, dst: src.array.type); where the mere existence of a coercible function with matching signature would trigger the coercion behavior. The compiler would reason about the destination type (e.g. Matrix(Array(int)) is coerceible to array.type, which is Array(int), which is a subtype of Array, so therefore Matrix(Array(int)) can be passed to an Array formal and that should instantiate with type Array(int).

5) Only support generic subtypes that have the same type constructor arguments as the parent type - then the compiler could find the parent type by passing the type arguments for the sub-type to it.

Besides Matrix/Array, the other main generic use case for coercions I know of is the record-wrapping case, like RefCounted. There, we'd want to say that RefCounted(t) is a subtype of t.

    subtype RefCounted(t), t

I think 5) above wouldn't work in that situation.

Update: I've verified that I can get Ben's example to work if I give the compiler the ability to determine the type to instantiate when resolving a call like acceptsGenericArray - which has a generic Array formal - with a Matrix argument. In my experiment, I hard-wired in:

actualType is Matrix or an instantiation of it
formalType is Array or an instantiation of it
the type to instantiate with is actualType->getField("array")->type.

mppf commented 7 years ago

My current favorite strategy here is different from what we've been discussing, so I wanted to list it here. We had been talking about having a cast function/method and then a isCoercible function/method so that there can't be confusion about one being handled by a cast and another being handled by coercion. I'm coming around to a different viewpoint, which is that we should have cast and coerce, and an expression like a:b should call coerce if it exists in preference to cast if it exists.

Then, the reasoning about generic types got me thinking that we're not really helping anything by having a type argument at all for isCoercible. Instead, I think we should have many coerce methods that return different types, and resolution will choose the one with a return type appropriate for the situation.

// general idea:
proc fromType.coerce return <expr of toType>;
// int to uint, real(32) coercions
proc int.coerce return this:uint;
proc int.coerce return this:real(32);
// Matrix example
proc Matrix.cast return array;
// RefCounted example
proc RefCounted.cast return instance;

The compiler would tolerate ambiguity when resolving such a coerce method. Instead of the normal resolution process, it would select the coerce method that returns a type compatible with the formal it's trying to call. (At least for now, this would involve enumerating the available coerce methods for a particular type and resolving them all in advance. In the future, these could be stored in a table).

The main alternative I see is something along these lines:

subtype Matrix(?t) : Array(t);
subtype RefCounted(?t) : t;

I can imagine implementing this by turning it into the above case. E.g. for the Matrix case, the compiler would generate a function like proc buildTypeForCoercion(arg:Matrix(?t)) type return t; and then the compiler would use these functions to:

know when a cast is possible (as above, by enumeration)
know the target type of a cast for a generic formal (it is the result type of the method)

vasslitvinov commented 7 years ago

My thinking today is that for the user it is most convenient to just provide cast and coerce functions, and for the compiler to do the rest.

As discussed previously, the coerce function would be provided for the cases where implicit conversion is allowed; it would be used in cast expressions as well.

While I still think that cast and coerce should be non-methods with two formals, I am open to other syntactic variations like having them be methods or have them be overloadable by the result type.

By "compiler to do the rest" I mean automation in the above example where a Matrix(int) actual is passed to a generic Array formal. For example, the compiler would analyze


proc coerce(type toType, from: Matrix) where toType==Array(from.t) ......

to determine that Matrix(int) coerces specifically to Array(int). When the compiler can't analyze successfully, it will just issue the same error as when passing a nil actual to a generic-class formal.

We could start with very simple pattern-matching analysis and extend it as resources and priorities allow.

vasslitvinov commented 7 years ago

BTW once we find a solution to coercing to a generic type, we can extend it to casts as well. So we can introduce the cool feature of casting to a generic type. For example:


var mat: Matrix(int);
var arr = mat: Array;

would result in arr being of the type Array(int).

mppf commented 7 years ago

@bradcray and I had a good discussion about this today. First, a note: I'm not planning to attempt to get this done by 1.15 feature freeze. So we won't expect it in the release.

Second, we came up with two pretty good motivating examples that can clarify some of the choices we are facing about whether or not coerce should be able to accept a type argument.

Example 1 : coercing ranges to other ranges

The range type includes some things that might naturally be coerced. It's a generic type with type idxType and param stridable (and param boundedType but we'll ignore that in this discussion). So, one might want to be able to coerce a non-stridable range into a stridable one. Or from say int(8) index type to int(16).

If our solution was a 0-argument coerce method, it seems it would be rather awkward to express all of the possibilities for possible range coercions. At least it would be enumerating all of the possibilities. In other contexts, though, that might not even be possible. (E.g. if range were expected to work with user-defined numeric types).

Example 2 : coercing ranges into domain

At the same time, we might consider having an automatic conversion from a range to a 1-D domain. In this case, there is a natural corresponding domain type to coerce into. However, the same problem above with acceptsGenericArray above strikes here, but with the different types:

var myRange = 1..20;
proc acceptsGenericDomain( d:domain ) { ... }
acceptsGenericDomain(myRange); // supposing range->domain coercions are supported

Here again, during generic instantiation, we need a way to know which domain type to create based upon the range type. And we can't pass the expected type as an argument to a coerce method because we can't pass a generic type as a type argument in Chapel.

Current Leanings

So, that leads us to thinking that perhaps we need both a type-argument version and a 0-argument version. That means a current design that covers all the cases might be:


// register a type available for coercing generic instantiations
fromType.coerce() return <someExpression>;

// register a coercion available when all types are known
fromType.coerce(type toType) return <someExpression>;

ben-albrecht commented 7 years ago

Another use-case for this feature:

/* Named tuple for version numbers */
record Version {
  var version: 3*int;

  proc major ref return version[1];
  proc minor ref return version[2];
  proc bug ref return version[3];
}

var v1 = new Version((9, 2, 0));
var v2 = new Version((10, 0, 0));

// Would like to do tuple-comparisons on Version type
// without writing a bunch of overloads:

if v1 > v2 then
  doSomething();
else if v1 == v2 then
  doSomething();
else if v1 < v2 then
  doSomething();

mppf commented 7 years ago

I think that the idea of registering a type available for coercing generic instantiation might be related to / be an alternative solution to issues that we face with generic functions such as proc foo(x:real(?t)) where today correct behavior depends on "stamping out" these functions for each size proactively.

bradcray commented 6 years ago

Stray thought on the user-defined cast part of this issue motivated by thinking about issue #10615 and #8242 this week. If we adopt issue #10615 and think of casts as being a form of initializer ("create an instance of this type from this value of another type") then a possible syntax for defining user-defined casts rather than cast could be to define a proc init:() ("create a new object via the cast operator") in symmetry to proc init=() ("create a new object via the assignment operator").

Taking a real quick scan of the long discussion above, it seems that when previously kicking around "should this be an initializer?" concept, the concerns were:

(a) that it could result in the C++-style confusion around "by defining an initializer [that takes an int, say], we've accidentally created a cast [from int to the type]." But with this approach, we'd be taking the opposite approach: making a special cast initializer and getting a free initializer from it. So I feel like this concern is reduced.

(b) Michael said:

"this strategy has problems of its own, among them are that you can't write 1 implementation of a generic cast to a group of types (sortof the dual of the problem you brought up with cast being a method on the casted-from object)."

and Vass wrote:

I am not much in favor of having cast be a method on the destination type because of the same symmetry argument I made against a method on the from-value. However, such cast would be distinct from an initializer (to avoid issues with int->vector coercion).

These concerns probably still exist in this proposal, but I don't have specific motivating cases in mind where they seem problematic (particularly if we can add secondary init: methods to an existing type). @mppf and @vasslitvinov, do you? (Maybe they're above and I just didn't read carefully enough).

mppf commented 6 years ago

(a) that it could result in the C++-style confusion around "by defining an initializer [that takes an int, say], we've accidentally created a cast [from int to the type]." But with this approach, we'd be taking the opposite approach: making a special cast initializer and getting a free initializer from it. So I feel like this concern is reduced.

I think the main C++ issue is really that such a cast "magically" enables coercion. But, it's true that it also affects how new MyClass(MyOtherClass) will resolve.

"this strategy has problems of its own, among them are that you can't write 1 implementation of a generic cast to a group of types (sortof the dual of the problem you brought up with cast being a method on the casted-from object)."

Right, to be more concrete, say you wanted to write a single cast operator that went from any integral type to any real type. Such a thing would require methods with generic receivers, which we don't support well today:

// cast from any integer to any real type
proc chpl_any_real.init:(arg:integral) { ... }

Another issue is that I find the : gets a little lost in the declaration.

But anyway, why stop at :? Why not also make proc init+(a, b) ? Which would handle a case such as var x = a + b and might be useful in reducing (some of the) unnecessary bigint temporaries.

I am not much in favor of having cast be a method on the destination type because of the same symmetry argument I made against a method on the from-value. However, such cast would be distinct from an initializer (to avoid issues with int->vector coercion).

I think the cast implementation can be an initializer - it just shouldn't automatically also enable coercion.

Lastly there is the question of how it can fit in with specification of coercibility. I personally think it makes more sense a type to say it can coerce to another type than to say a type can coerce from another type. My reasoning here is that it flows well with type inference - the compiler can resolve the body of a coerce method and see what type it results in). So, it seems that such a fromType.coerce() method would not suggest a cast method in particular is needed. That implies (for me anyway) that we'd leave the casts with a totally separate implementation from the coercions; possibly with fromType.coerce() and fromType.coerceTo(type t) or possibly with fromType.canCoerce() and fromType.canCoerceTo(type t) methods.

chapel-lang / chapel