dotnet / csharplang

The official repo for the design of the C# programming language

11.13k stars 1.02k forks source link

Proposal: Existential types for interfaces and abstract types #1328

Open agocke opened 6 years ago

agocke commented 6 years ago

Introduction

Have you ever wanted an interface (or abstract type) to be parameterized on a polymorphic variable to its implementer, but not its users? Have you ever wanted to parameterize over internal state, without revealing what that state contains to your users? This is what existential types allow you to do.

Let's look at an example.

interface class IDataFlow<TLocalState>
{
...
    TLocalState Start { get; }
    void Next(TLocalState);
    List<string> GetCurrentDataFlowDiagnostics();
}

What does this type do? Let's say it walks a data flow tree and produces diagnostics. It does so incrementally -- the user can step through the tree by first calling Start, then moving forward by calling Next and passing the current state. They can't actually view the current state represented by TLocalState, but that's OK, they don't need to! They can view the diagnostics at any point, do some computation, then decide to continue or quit. In this sense, TLocalState is a black box to the user.

However, TLocalState is anything but a black box to an implementer. They care very much that this type is generic; different data flow analyses may have wildly different state implementations depending on what they're tracking. Moreover, since they provide the state, it's not a black box to them, they'll have a concrete type in hand. For instance,

class DefiniteAssignment : IDataFlow<BitVector> { ... }

The implementer of DefiniteAssignment would be able to use BitVector statically in this implementation.

But this presents a problem. This is how a user interacts with IDataFlow:

class C
{
    void DoDataFlowAnalysis<T>(IDataFlow<T> df)
    { ... }
}

Note that the worker method must be generic, even though the type parameter is useless to them. Moreover, they have to flow this parameter around through their whole program until the entry point where they actually pass a specific data flow analysis. And if they have multiple methods that do different analyses, they have to do this for each one, on every method.

So what would we rather have? Probably something like this:

interface IDataFlow<protected TLocalState> { ... }
class DefiniteAssignment : IDataFlow<BitVector> { ... }
class C
{
    void M(IDataFlow df) { ... }
}

Now TLocalState is an existential type.

Details

Of course, the question is what is this and how does it work?

Now may be the time for a quick digression into theory to explain the background of existential types. The name comes from logic, where you can "existentially quantify" a variable, like there exists an x such that x^3 == 9. Similarly, you can also "universally quantify" a variable: for all x, x * 0 == 0.

Due to the Curry-Howard correspondence, the systems of logic and the typing calculus have parallels with one another. In the case of the C# type system, universal quantification equates to generic polymorphism. When we declare a method M<T>, this could be seen as the declaration of a function mapping from all possible Ts to a specialized M for each T. In other words, we're quantifying the method M for all possible Ts.

If we already have an analogy for universal quantification, there's a question of what existential quantification looks like. In the previous example, if we said there's an instance of M<T> for all Ts, existential quantification would say that there's a type U<T> for some specific T. Existential types are a way of working with a type variable that's set to something but you don't know what it is.

With that brief divergence done, we should now talk about how this feature would be implemented. For this, I'm going to turn to a simpler example. Say we have a counter interface:

interface ICounter<protected T>
{
    T Start { get; }
    void Next(T current);
    bool Done { get; }
}

Here's how we could use it:

void M(ICounter ic)
{
    var x = ic.Start;
    while (!ic.Done)
    {
        x = ic.Next(x);
    }
}

This immediately provokes some questions, most importantly, what is the type of x? The answer is, it's ic.T. This is some type that exists on ic. Note, it doesn't exist on ICounter -- it's not static. You can't take another instance of ICounter and unify the two T's. However, it does unify with the other T on ic; the value returned from ic.Next is the same type as the one returned from ic.Start. How exactly we want to represent this in the language is an open question. Here are the specific problems to solve:

[ ] Will we have special syntax for this type?
[ ] Do we need to introduce a binding in some way?
[ ] The type clearly can't be returned or assigned to a field as is, because there is no way to mention it, so is it closer to an anonymous type?

Once we decide how to use it on the interface, we also have to decide how to implement that interface. In this case, it seems little different from an ordinary type parameter:

class Counter : ICounter<int>
{
     int Start => 0;
     int Next(int current) => current + 1;
     bool Done => current == 42;
}

Lowering ===

If that is how the user sees it, we also have to deal with how the CLR sees it. Luckily, we have a theorem in logic that says we can transform existential types into an encoding of universal types. I've used that theorem as a base for a transformation for C#.

Lowering from above code:

using System;
interface ICounter<T>
{
   T Start { get; }
   T Next(T current);
   bool Done(T current);
}

class Counter : ICounter<int>, ICounterWrapper
{
    public int Start => 0;
    public int Next(int current) => current + 1;
    public bool Done(int current) => current == 42;
    void ICounterWrapper.Unwrap<TWitness>(TWitness witness)
        => witness.Invoke(this);
}

interface ICounterWitness
{
     void Invoke<T>(ICounter<T> counter);
}

interface ICounterWrapper
{
    void Unwrap<TWitness>(TWitness w) 
        where TWitness : ICounterWitness;
}

struct MWitness : ICounterWitness
{
    public void Invoke<T>(ICounter<T> ic)
    {
        var x = ic.Start;
        while (!ic.Done(x))
        {
            x = ic.Next(x);
        }
    }
}

class C
{
    void M(ICounterWrapper wrapper)
    {
        wrapper.Unwrap(new MWitness());
    }
}

In this transformation, there are two "intermediate" interfaces that are created to transform the existential type production into a universal type production. In addition, the code which uses ICounter<protected T> has been lowered into a synthesized struct, MWitness, that executes the given code. The scope of code currently hoisted into the struct is the whole method in this example, but it it's an open question what the proper hoisting scope is -- and how this interacts with other code hoisting, like nested functions, async methods, and iterator methods.

Similarly, there are other questions like how this interacts with other language and runtime features, especially reflection, but I believe the feature is at least possible to implement on the current CLR.

CyrusNajmabadi commented 6 years ago

This is very very cool and interesting. Thanks for the very clearly written write-up! :)

From my quick reading, this seems to share aspects of the design with Shapes (for example Witnesses). Would this sort of design fall out as a special case of Shapes?

i.e. with Shapes you have to define the actual shape, and then manually provide the witness for specific types to make them works with that shape. It sounds like this proposal can automatically do that for these specific existential types.

Am i understanding this correctly? Or am i way off?

I also appreciate that, like shapes, this is a suggestion for a powerful language feature that should not incur implicit, unavoidable, allocation costs if someone uses it. Big +1 on language features behaving that way.

CyrusNajmabadi commented 6 years ago

The scope of code currently hoisted into the struct is the whole method in this example, but it it's an open question what the proper hoisting scope is -- and how this interacts with other code hoisting, like nested functions, async methods, and iterator methods.

It seems like the appropriate location would sometimes have to be as a nested type in the containing class. i.e. ignoring optimizations, you'd generally have to do something like this:

class C
{
  void M(ICounterWrapper wrapper)
  {
    wrapper.Unwrap(new MWitness(this));
  }

  private struct MWitness : ICounterWitness
  {
    private C _this;
    public MWitness(C _this) => this._this = _this;

    public void Invoke<T>(ICounter<T> ic)
    {
      var x = ic.Start;
      while (!ic.Done(x))
      {
        x = ic.Next(x);
      }
   }
  }
}

This way you could appropriate translate calls inside C.M to 'this' into the rewritten instance method. Note: it's unclear how things like base virtual calls would work in a verifiable manner. But perhaps it's ok if they don't or maybe there are tricks to make them work (maybe bridge methods of some sort?).

Actually, given that we support generating nested private classes for all sorts of things today, i imagine that this shouldn't be an issue. Or, at the very least, the same restrictions that exist for those cases would exist for this feature as well.

CyrusNajmabadi commented 6 years ago

class C
{
    void M(ICounter counter);

// becomes

class C
{
    void M(ICounterWrapper wrapper)

This is treading into interesting ground for C#. In general the signature you write is the signature you see in metadata. (Though clearly, things like tuples/dynamic show we're moving away from that). Is your thought that things like ICounterWrapper would be completely hidden from users, and people would only ever see this as ICounter ?

CyrusNajmabadi commented 6 years ago

You can't take another instance of ICounter and unify the two T's

This will probably need some very deliberate work to make good error messages.

For example:

void M(List<ICounter> list) {
  var counter1 = list[0];
  var counter2 = list[1]; 

  var start1 = counter1.Start;
  var start2 = counter2.Start;

  // What are good errors here:
  counter1 = counter2;
  start1 = start2;
}

In essence, under the covers this would be treated (AFAICT) as if you had written.

void M(List<ICounter> list) {
  ICounter<UnspeakableType1_UsedFor_ICounter_T> counter1 = list[0];
  ICounter<UnspeakableType1_UsedFor_ICounter_T> counter2 = list[1]; 

  var start1 = counter1.Start;
  var start2 = counter2.Start;

  // What are good errors here:
  counter1 = counter2;
  start1 = start2;
}

I'm not even sure how to best phrase the error for the user. :)

Any thoughts on how to best explain this to hte user?

alrz commented 6 years ago

this seems to share aspects of the design with Shapes (for example Witnesses). Would this sort of design fall out as a special case of Shapes?

Just wanted to point out that shapes can actually use this feature under the hood to provide an implicit type parameter e.g. TSelf which would be the type of the implementation target.

// implicitly defined for all traits
trait Eq< /* protected TSelf, */ Rhs = TSelf> {
// ...
}

implement Eq for SomeType { // TSelf = SomeType, Rhs = SomeType
 // ...
}

From what I've seen in Rust, this type parameter is crucial to define and implement traits/shapes.

agocke commented 6 years ago

@CyrusNajmabadi Very good questions, and I'm not going to pretend I have direct answers for all of them. Here's what I have so far:

i.e. with Shapes you have to define the actual shape, and then manually provide the witness for specific types to make them works with that shape. It sounds like this proposal can automatically do that for these specific existential types.

I think this is correct. I have to read the shapes proposal to understand this more, but I think they have elements of existential types. What I would say is the primary differentiation is that an existential type on an interface is part of the actual type, while shapes seem to be limited to generic constraints on a type parameter.

It seems like the appropriate location would sometimes have to be as a nested type in the containing class. i.e. ignoring optimizations, you'd generally have to do something like this

Yes, this is true. If you can imagine a kind of inline-struct declaration syntax for C#, I think the more appropriate phrasing is something like

void M(ICounterWrapper w)
{
    w.Unwrap(new MWitness : ICounterWitness
{
    void Invoke ....
});
}

This makes it more clear that the scoping of the generated type really belongs to the scope of the method it's used in. This is where parallels with anonymous types and closure conversion come in.

In general the signature you write is the signature you see in metadata

This is, I think, an open question of what we have in source and what we generate in metadata. Maybe the wrapper type is what you see in source and in metadata. Then there's really no difference at the use site. However, at the interface site you'll see a difference between what the source containing type is and the runtime containing type is. I don't yet have a good idea how to resolve these ambiguities.

Any thoughts on how to best explain this to hte user?

This is entering a level of practicality I haven't worked out yet. 😉 Mathematically, I think this is basically explaining to a user that, in terms of universal quantification, List<T> is not the same as List<U>. I'm not exactly sure how to port that explanation over, but I think it should fit in the same mold.

evincarofautumn commented 6 years ago

For functions that take existentials like (∃x. A) → B (e.g. void M(ICounter)), you can encode them as just ∀x. A → B (void M<T>(ICounter<T>)) by Skolem normalisation. The programmer would write void M(ICounter) as you’ve done here, and the compiler would emit void M<T>(ICounter<T>). That solves the problem of having to explicitly thread the type variable through all the use sites, and it’s easy to understand.

You’d still have to do something like your proposed desugaring (CPS conversion) to handle functions that return existentials. (It looks like you’d basically encode A → ∃x. B (ICounter MakeCounter()) as ∀r. A × (∀x. B → r) → r (<R> R MakeCounter(<T> Func<ICounter<T>, R>)) but working around the lack of higher-ranked universal quantification.)

Also, to bikeshed: using <protected T> suggests that something like <private T> or <internal T> might work, but they don’t have any clear meaning. I think <virtual T> would be more illustrative—it’s not an access modifier, but a polymorphism modifier.

jnm2 commented 6 years ago

I'm super excited about the idea that we might get associated types! I've thought that would be cool in theory long before I knew what they were called. Also, anything we can do to reduce the need for declaring gratuitous generic parameters (and specifying them for overload resolution) is a welcome relief!

gulshan commented 6 years ago

Does this proposal somehow aligns with my "Type pattern for generic types" #268 proposal, proposing inference of type parameter for generic types in patterns.

MadsTorgersen commented 6 years ago

Great proposal!

This approach to existential types has a lot in common with virtual types in Beta or abstract types in Scala. I wonder if it were more syntactically intuitive to use a notation closer to those, where the type "parameter" is declared as a kind of abstract member of the class:

interface ICounter
{
    type T; // or some such syntax, could have a where clause
    T Start { get; }
    void Next(T current);
    bool Done { get; }
}

class Counter : ICounter
{
     public type T = int;
     int Start => 0;
     int Next(int current) => current + 1;
     bool Done => current == 42;
}

That way it becomes clear that the type T really is a member of a given instance of ICounter, and is tied to the instance.

A problem in both your formulation and mine is that in the usage:

void M(ICounter ic)
{
    var x = ic.Start;
    while (!ic.Done)
    {
        x = ic.Next(x);
    }
}

You assume that ic.Ts coming out of ic can be fed back in as ic.Ts of that same ic. That is only true and sound insofar as you trust the value of the ic variable not to have changed. In a pervasively mutable language such as C# this is far from given. Thus, the feature is probably best when coupled with the ability to declare parameters and locals readonly.

gafter commented 6 years ago

I'm very happy to see @agocke's clear description of this feature, which I think has real potential.

alrz commented 6 years ago

You assume that ic.Ts coming out of ic can be fed back in as ic.Ts of that same ic. That is only true and sound insofar as you trust the value of the ic variable not to have changed

In Rust, you are only allowed to omit associated types if you use the trait as a generic constraint, that would freely give you such guarantee, as long as all variables are of the same type parameter.

 interface ICounter {
    type T;
    T Start { get; }
    T Next(T current);
    bool Done { get; }
 }

void M<C>(C ic)
  // it's only allowed as a type constraint when type arguments are omitted 
  where C : ICounter 
{
    var x = ic.Start;
    while (!ic.Done)
    {
        x = ic.Next(x);
    }
}

So the above could be lowered to the following (considering that restriction),

void M<C, __T>(C ic)
  where C : ICounter<__T>
{
    var x = ic.Start;
    while (!ic.Done)
    {
        x = ic.Next(x);
    }
}

No need for x to be readonly at all.

Note: the generated type parameter __T would need to inherit all the constraints from C::T. It'd be called an "associated type" due to the fact that it's associated with the type C.

I think something like "inline type constraints" (https://github.com/dotnet/csharplang/issues/279) could help here to bring the constraint closer to the type parameter declaration for the sake of readability.

void M<C : ICounter>(C ic)

Again, in Rust you can refer to associated types with a :: operator. In C# that would look like this:

void M<C : ICounter>(C ic, C::T item)
// ->
void M<C, __T>(C ic, __T item) where C : ICounter<__T>

And if you do want to explicitly provide type arguments (for example, if you want to use it in fields, parameters, etc), you could use "named type arguments" (https://github.com/dotnet/csharplang/issues/280)

void M(ICounter<T: int> ic)

alrz commented 6 years ago

We could avoid all the new syntax here by converting

void M(ICounter ic)

void M<__C, __T>(__C ic) where __C : ICounter<__T>

That would save a lot of syntax changes. But I'm not sure if we could actually allow this for all interfaces?

(however, there would be no way to refer to the type __T here i.e. C::T in the examples above.)

agocke commented 6 years ago

@alrz I think your transformation is unsound. Aside from making an existential type viral, as you'd now have to push your hidden type parameter all the way up the chain, consider

void M(ICounter c1, ICounter c2)
{
    c1 = c2;
}

Both terms should have the same type, so this should be legal. But with your transformation this is probably something like

void M<__C1, __T1, __C2, __T2>(__C1 c1, __C2 c2) 
    where __C1 : ICounter<__T1>, __C2 : ICounter<__T2>

which does not type check.

agocke commented 6 years ago

@evincarofautumn You followed pretty much the same path as I did, but I think your simpler expression in terms of universals fails with the same problem as I pointed out in the previous comment.

alrz commented 6 years ago

@agocke Yeah, my second comment doesn't seem like a good idea, but the same example with a mandatory nominal type parameter would be rejected by the compiler.

void M<C1, C2>(C1 c1, C2 c2)
  where C1 : ICounter
  where C2 : ICounter
{
    c1 = c2; // ERROR
}

which would give you the same transformation

void M<C1, C2, __T1, __T2>(C1 c1, C2 c2) 
    where C1 : ICounter<__T1> where C2 : ICounter<__T2>

agocke commented 6 years ago

That's true, but I'm not sure what this buys us over the encoding I originally presented, where this would probably end up like

void M<C1, C2>(C1 c1, C2 c2)
    where C1 : ICounterWrapper,
          C2 : ICounterWrapper
{ ... }

Maybe you've removed the hoisting and witness, but I'm not sure what the benefit would be.

alrz commented 6 years ago

Do you have any examples that are not presentable with that transformation? If no, I think it's just a simpler solution (if that counts as a benefit). On the usage side, I think the main advantage is that a method doesn't have to be generic over all its inputs' type parameters (again, if that counts).

Not sure if you intend this to work anywhere outside method parameters? but my suggestion definitely wouldn't be applied anywhere else.

Note: shapes would probably go with the same path (compiler generated type parameters) if they don't end up being a clr feature. though I hope that they eventually do because of huge codegen behind the scene with current workarounds.

agocke commented 6 years ago

I had envisioned this working wherever an interface works now, i.e. in fields, arrays, etc.

alrz commented 6 years ago

where this would probably end up like

void M<C1, C2>(C1 c1, C2 c2)
    where C1 : ICounterWrapper,
          C2 : ICounterWrapper
{ ... }

That's still using type constraints with body hoisted into the struct. In order for this to be usable in fields, I guess all member methods should be hoisted in various structs? where the type parameter should be defined then? what if we need more than one type with omitted type arguments? witness and wrapper, and the implementer are all coupled together so the signature can't be changed for each usage. I'm starting to think that this should be eventually restricted to methods which then would render witness/wrapper complication unnecessary? I still need to process this but on the surface I suppose it won't be a simple transformation (that is, it could grow very fast with simply adding variables to the equation).

agocke commented 6 years ago

At the moment, I wasn't considering your proposal to transform everything into hidden constraints. In my transformation, I don't think we would need any extra declaration level generic parameters to hold a field or array of the interface with an abstract type.

evincarofautumn commented 6 years ago

void M(ICounter c1, ICounter c2)
{
    c1 = c2;
}
Both terms should have the same type, so this should be legal.

I wouldn’t expect this to typecheck. One of the main reasons I want existentials is to make c1 and c2 incompatible even if c1.T and c2.T are the same type, as you said:

You can't take another instance of ICounter and unify the two T's.

Edit: nevermind, I see the intent with reference assignment here.

agocke commented 6 years ago

Yeah, to be clear here, the practical impact is something more like

void M(ICounter[] counterArray) ...

That should be legal even if each concrete ICounter instance in the array has a different substituted existential type.

qrli commented 6 years ago

How about enabling this existential type capability at consumer side rather than interface declaration side? That means, I can write a non-generic algorithm method with any existing generic interface as argument type, and treat some of its generic parameter as existential. e.g.

void M(ICounter<exist T> c) { }
void M(IDictionary<string, exist T> d) { }

evincarofautumn commented 6 years ago

Oh of course, “(∃T. Counter<T>)[] → void” is not equivalent to “∀T. Counter<T>[] → void” and I didn’t mean to imply it was. You can’t move the quantifier outside the scope of the array type constructor. It falls to the second case in the desugaring I described.

To be clear, it’s equivalent to “(∀R. (∀T. Counter<T> → R) → R)[] → void”, which is basically the same as your encoding, void M<C>(C[] counterArray) where C : ICounterWrapper—the outer (∀R) function is the ICounterWrapper and the inner (∀T) function is the ICounterWitness. You were able to set R to void in your example because the method doesn’t return anything, and the wrappers just hide the higher-ranked (and impredicative, in this case) polymorphism.

agocke commented 6 years ago

@evincarofautumn Exactly.

KrisVandermotten commented 6 years ago

I'm trying to understand what problem this proposal is trying to solve. Do I understand correctly that it could be used for the following scenario?

Several interfaces return some kind of correlation state object from one method, expecting it to be passed into another one at a later stage. For example, WCF's IParameterInspector is such an interface.

Note that in this interface, the correlation state is of type object. Under your proposal, it would effectively be of some unspeakable type. Because of that, the type system can ensure that the object passed into the second method is either the object returned from the first one, either null. The current implementation cannot enforce that. Secondly, in the current implementation, the second method needs to cast the object parameter to the appropriate type. Under your proposal, that cast would not be needed (but a null-check still would).

Is this understanding correct?

A similar situation, but not the same, is the IAsyncResult correlation state in the BeginInvoke/EndInvoke pattern. If I understand correctly, your proposal would not be of any help there, because there is no interface or abstract class involved. It's just a pattern in a concrete class.

Is my understanding still correct?

If so, in the interest of improving my understanding, allow me to play the devil's advocate, and ask you: is your proposal an incomplete solution for a very small problem?

agocke commented 6 years ago

@KrisVandermotten I don't understand why you think null is a valid value of the existential type T.

evincarofautumn commented 6 years ago

@KrisVandermotten Existential types are sort of the quintessential representation of encapsulation in logic and functional programming. Some things I’ve used them for (mostly in Haskell):

Abstract data types, constraining the possible operations on values produced or consumed by an API. I’ve used this, for example, to have a list of Producer/Consumer pairs, where the only type-correct operation is to run the Producer (synchronously or asynchronously) and send its result to the Consumer.
Parameterised modules, in the ML sense.
Associative data structures that can give you a compile-time proof that a key is present, so you can write a “lookup” method that never fails.
References that are guaranteed not to escape a certain scope, so you can, for example, freely allocate objects from an arena and know that it’s safe to deallocate the whole arena after its scope completes.
Type-aligned sequences—heterogeneous sequences where the type system enforces the order of elements. For example, a series of Operation<Input, Output> values where the Output of one Operation must match the Input of the next.
Object-oriented programming—vtables are existentially quantified! Admittedly this isn’t very compelling in an OOP language. ;)

KrisVandermotten commented 6 years ago

@agocke

I don't understand why you think null is a valid value of the existential type T.

Indeed it's not, my mistake. But default is, isn't it?

agocke commented 6 years ago

Interestingly, I think that's up to us. Normally, default is always a legal value, but since existential types exist in a new context we could disallow it if we wanted, and I'd probably recommend that.

gregsn commented 5 years ago

@agocke

Yeah, to be clear here, the practical impact is something more like
void M(ICounter[] counterArray) ...
That should be legal even if each concrete ICounter instance in the array has a different substituted existential type.

I agree. Mixing objects of different type instantiations in one collection is one of the killer applications of this proposal and its take on existential types. The typical workaround is to introduce an additional ungeneric "ICounter" allowing to manage an array "ICounter[]", which is just painful.

So I think it's really about the mixing aspect, which however never got into the center of this discussion, which is a pity, at least it made it harder for me to follow the discussion, I think these practical benefits should be the center of the design process. So let me try to shift the focus on mixing ICounter<?> instances and therefore rewrite your original M(ICounter ic) method, which I turned into something which is not special at all (as it is not about mixing). It now uses universal quantification, because why limit the use cases of the method:

T Count<T>(ICounter<T> ic)
{
    var x = ic.Start;
    while (!ic.Done)
    {
        x = ic.Next(x);
    }
    return x;
}

Here we don't lose much when we stick to the universally quantified type parameter. The specific example at least doesn't suffer when we accept an ICounter<T>. We instead win a bit as we can even return x. In trivial cases we'd not use existential quantification at all and just call int x = Count(new IntCounter());.

Now let's finally start focussing on the part where the practical impact comes into play: when mixing ICounter<?> instances, or in other words when ICounter<> instantiations "collide": In my understanding we'd want to be able to decide upon existential quantification right here: when we need to mix types (intersect types).

var arr = new ICounter<any>[]{ new IntCounter(), new FloatCounter() };
var x = boo ? (ICounter<any>) new IntCounter() : new FloatCounter();

with "named type arguments" (#280) mentioned by @alrz this would look like this:

var arr = new ICounter<T: any>[]{ new IntCounter(), new FloatCounter() };
var x = boo ? (ICounter<T: any>) new IntCounter() : new FloatCounter();

In the latter case, we mention the definition side type parameter, but we don't mention a type argument. In both cases, we would want to state that any type instantiation of ICounter<T> shall be fine, which is a bit like saying any type argument for ICounter<T> shall be fine. This very near to the ICounter<exist T> idea proposed by @qrli. An important detail of both suggestions: no ambiguity when an ungeneric interface ICounter exists as well. But the most important difference to the original proposal: like @qrli pointed out already: the existential type capability would now be at the consumer side rather than interface declaration side (we'd skip the protected keyword in the definition). So this is like saying: we want the interface to be just an ordinary generic interface. And we want to have ways of working with generic types when we don't know what to put for T. I know this wishful thinking, but I like wishful thinking within the design process.

The above lines shall type check as all Counters added to the collection implement ICounter<T> of some T.

We would now be able to write:

foreach (var c in arr)
{
    var obj = Count(c) as object;
}

This shall type check as Count() accepts any ICounter<T> implementation. The result assignment type checks as any T can be boxed to an object.

But we should also be allowed to pass the collection like this:

void CountAll(IEnumerable<ICounter<any>> counters)

And of course the original example M(ICounter ic) could also be expressed via M(ICounter<any> ic), even though it might be good practice to stick with universal quantification as long as possible.

Let me try to blend this proposal with my proposal here: https://github.com/dotnet/coreclr/issues/25288

In this proposal, I also focused on the scenarios where objects of different types "collide" and the user had to pick one common supertype in order to pass them further downstream...

var counter = boo ? (ICounter<any>) new IntCounter() : new FloatCounter();
var ab = boo ? (IA & IB) new ABC() : new ABD();
var abcOrabd = boo ? (ABC | ABD) new ABC() : new ABD();

Let's try to find the similarities: (IA & IB) could be seen as (any T: IA, IB): the type union of all types T, that are a subtype of IA and IB. (ABC | ABD) would be a manual type union. ICounter<any> could be seen as (any TCounter<any TArg>: ICounter<TArg>): the type union of all type instantiations of ICounter<T>. This even looks more precise as we focus on TCounter and make it obvious that we even allow different ICounter<int> implementations. All of these are about ad-hoc type unions.

So maybe there is a chance to combine these ideas. It looks like (IA & IB) and ICounter<any> can be seen as shortcuts for the more verbose type expressions.

In that case, we could concentrate on the expression (any TConcreteType<..>: class, TBase, IA..) and refine what is allowed in here. I am guessing that we are allowed to invent type parameters on the fly just to be able to use them after the colon. So TConcreteType and the type parameters within <> are existentially quantified and are only valid within that type expression ().

Now that the type expression gets more complex we might want to declare such a type once and then use all over the place?

[ ] Now how about additionally allowing something like
```
public implicit interface IAB: IA, IB;
public implicit interface IAnyCounter: ICounter<any>; 
```
Implicit interfaces don't need to get implemented explicitly by type definitions but are implicitly implemented whenever the constraints are fulfilled. (Implicit) interfaces already imply that there is an actual concrete type that is a subtype of that interface. (any TCounter<TArg>: ICounter<TArg>) got simplified without losing the notion of "there might be many TCounter that implement a particular ICounter<TArg>.

[ ] I am thinking in either case that the type system of the CLR would need to be extended and the JIT compiler would need to check these newly introduced existential types before emitting machine code, but I have to say I am quite impressed by your lowering techniques, so who knows...

CameronAavik commented 3 years ago

How would the following be lowered?

void M(ICounter[] counters)
{
    foreach (var counter in counters)
    {
        var x = counter.Start;
        while (!counter.Done)
        {
            x = counter.Next(x);
        }
    }
}

Would it be like the following? If so, how would we approach rewriting methods like this in the general case?

void M(ICounterWrapper[] counters)
{
    foreach (var counterWrapper in counters)
    {
        counterWrapper.Unwrap(new MWitness());
    }
}

And another question, is the idea that each usage of ICounter would result in a new ICounterWrapper class? Or would the wrapper class be generated once and reused? The reason I ask is because I think there should be some support for returning a type other than void. This would enable something like the following where the Invoke method on the wrapper needs to return an int instead:

interface IArrayContainer<protected T>
{
    T[] Items { get; }
}

int Count(IArrayContainer arr)
{
    return arr.Items.Length;
}

While I know this topic is a bit old, I am trying to see if support for existential types could be emulated at all using the new source generators and wanted to understand how the lowering works a bit better to see if it is possible. At the very least, I think source generators can generate the ICounterWitness, ICounterWrapper and the implementation of ICounterWrapper for the developer, but the consumer would still need to write the MWitness and call wrapper.Unwrap(new MWitness()) themselves.

evincarofautumn commented 3 years ago

@CameronAavik I think there only needs to be a single wrapper shared amongst all uses. The wrapper isn’t really necessary in this particular example. I think you need it in general in C# because there’s no way to pass polymorphic functions as arguments without making them into generic methods of a wrapper type, and you need such polymorphism to have different return types, by parameterising the witness on its result type (i.e. ICounterWitness<Result>). If there’s a way to avoid having it explicitly appear in the desugaring, that’d be nice though.

In case anyone finds it helpful, to refresh my own memory I wrote a Haskell version of this example, which has a witness generic in the result type, and shows where you can skip the wrapper. (Aside, it models C# interfaces using records instead of typeclasses, just to avoid adding even more existential noodling for clarity’s sake, but they’re equivalent here.)

markm77 commented 3 years ago

I want to fully support this proposal and give some input based on my work with interfaces using type parameters in both Swift and C#. In my experience type parameters for an interface are either (a) completely determined by the implementation (existential types) or (b) genuinely independent generic parameters that need to be specified as <> parameters to the interface implementation (e.g. allows the possibility for a class to implement an interface multiple times with different parameter choices). Supporting both kinds of type parameter well will be a great boost for C#.

At the moment using interfaces with type parameters in C# often leads to verbose code with much duplication of information such as type parameter arguments and where clauses. This makes code brittle and look complicated even in relatively simple cases.

I want to give a small example of C# code from a library I am developing to show how needlessly verbose things get today in C# and how this could be fixed.

Here is the example class (source):

    internal class
        FluentContextPostOnlyEntity<TPostOnly, TPublicRequest, TPublicResponse>
        : IFluentContextPostOnlyEntity<TPublicRequest, TPublicResponse>
        where TPostOnly : class, IPostOnlyWithPublicInterface<TPostOnly, TPublicRequest, TPublicResponse>, new()
        where TPublicRequest : class // required by IPostOnlyWithPublicInterface
        where TPublicResponse : class // required by IPostOnlyWithPublicInterface

Here is the same class with comments listing two problems which could be fixed with C# language improvements:

    internal class
        // Problem #1: TPostOnly is the only real independent parameter;
        // TPublicRequest and TPublicResponse are fully determined by TPostOnly's implementation of
        // IPostOnlyWithPublicInterface and should not be exposed here.
        FluentContextPostOnlyEntity<TPostOnly, TPublicRequest, TPublicResponse>
        // Problem #1 cont'd: TPublicRequest and TPublicResponse are fully defined by TPostOnly;
        // there is no freedom in their choice and they should not be exposed here.
        : IFluentContextPostOnlyEntity<TPublicRequest, TPublicResponse>
        // Problem #2: Requiring that TPostOnly implements IPostOnlyWithPublicInterface
        // requires explicitly re-stating, redundantly, all the requirements (where clauses) which apply to 
        // IPostOnlyWithPublicInterface.
        // It would be much better if these requirements were inferred by the requirement that
        // TPostOnly implement
        // IPostOnlyWithPublicInterface and if not satisfied simply trigger a compile-time error.
        // Forced repetition (cascading) of where clauses creates a lot of brittleness and code noise. 
        where TPostOnly : class, IPostOnlyWithPublicInterface<TPostOnly, TPublicRequest, TPublicResponse>, new()
        where TPublicRequest : class // required by IPostOnlyWithPublicInterface
        where TPublicResponse : class // required by IPostOnlyWithPublicInterface

Problem # 1 is hopefully fixed by existential types. Problem # 2 is hopefully fixed by allowing "inferred where clauses".

Here is the same class if these problems are fixed:

    internal class
        FluentContextPostOnlyEntity<TPostOnly> : IFluentContextPostOnlyEntity<TPostOnly>
        where TPostOnly : IPostOnlyWithPublicInterface<TPostOnly>

With a couple more language updates (support for TSelf as an existential type and a way to specify it hasnew() and must be a class) this would further reduce to

   internal class
        FluentContextPostOnlyEntity<TPostOnly> : IFluentContextPostOnlyEntity<TPostOnly>
        where TPostOnly : IPostOnlyWithPublicInterface

With support for inferred where clauses we could optionally infer the TPostOnly where clause from IFluentContextPostOnlyEntity<TPostOnly> to make things as simple as

    internal class
        FluentContextPostOnlyEntity<TPostOnly> : IFluentContextPostOnlyEntity<TPostOnly>

As has been shown, this example class can (and in my view should) be made much more readable and easy to understand. Also with existential types the consuming code only needs to know about a single type parameter (TPostOnly) rather than other types which should be internal to the class.

In summary, C# programming with interfaces and type parameters would be be massively improved by support for:

existential types
inferred where clauses (*)
native TSelf existential type and a way to specify associated conditions such as class, new().

I really hope we can improve C# in this direction!

Thanks for all the hard work.

BR, Mark

(*) Edit: I have noticed from some web searching that "inferred where clauses" are a bit controversial and probably not going to happen. I think the other two features done well (existential types and native TSelf) will anyway massively reduce the problem of where clause proliferation since existential type where clauses are "inside" an interface and where clauses relating to TSelf could be covered by interface inheritance (e.g. like Swift's protocol X: AnyObject) and interface support for constructors.

Edit 2: You may be interested to know that Swift's lack of support for proper generic type parameters (only existentials) caused me a problem there: see my comment. So supporting both kinds of type parameters (generics and existentials) is definitely the best solution!

markm77 commented 3 years ago

@MadsTorgersen Is there any hope for adding this to https://github.com/dotnet/csharplang/issues/4144? 😀 This and other interface limitations (no static members and to a lesser extent no constructor members) really limit generic programming in today's C#. I'm really hoping these things can also help lead us to .NET Numerics (interesting ref I found from LDM notes: https://github.com/Partydonk/partydonk/blob/master/Generic%20Math/Generic%20Math%20in%20.NET%20-%20Contractual%20Static%20Interface%20Members%20in%20CSharp.pdf).

stephen-hawley commented 3 years ago

On of the things to keep in mind with existential types is that they're not actually types. It's really a type-shaped hole rather than an actual type. Therefore, existential types should never be used directly.

public void SomeMethod (ISomeExistential<T> existential) { } // incorrect
public void SomeMethod<T>(T existential) where T:ISomeExistential { } // correct

The canonical example of protocols with associated types in Swift solves the problem of letting the type adopting the protocol manage its associated types. For example:

public protocol Animal {
     associatedtype Food
     func feed (_ : Food)
}

public struct Grass { }
public class Cow : Animal {
    typealias Food =  Grass;
    public func feed(_ f: Grass) {
       print("munch, munch");
    }
}

// utility function:
public func feedAnAnimal<T:Animal> (animal: T, food: T.Food) {
     animal.Feed(food);
}

When it comes to implementing this in C# it gets uncomfortable:

public interface IAnimal<F> {
    void Feed(F food);
}

public struct Grass { }
public class Cow: IAnimal<Grass> {
    public void Feed(Grass food) {
        Console.WriteLine("munch, munch");
}
public static class Utilities {
    void Feed<A, F>(A animal, F food) where A:Animal<F> { // yuck
        animal.Feed(food);
    }
}

What happens is that anything that consumes this existential needs to be fully parameterized on all the types. The more associated types, the uglier it gets.

One other thing that comes along with this is dynamic self, which you can think of as a 0th order associated type or an implicit associated type. Dynamic Self means "the most derived type that adopts this protocol".

For example, let's say we write this protocol in swift:

public protocol Clonerizer {
    init (instance: Self)
}
public extension Clonerizer {
    func copy() -> Self {
        return Self.init (instance: self);
    }
}

open class Foo : Clonerizer {
    public var X: Int
    public init (a: Int) {
        X = a
    }
    required public init(instance: Foo) {
        X = instance.X
    }
}

public class Bar : Foo {

}

let b = Bar (a: 3)
let c = b.copy()
print(type(of:c), c.X)
// prints: Bar 3

Executive sum: parameterization of protocols through existential typing makes the types a property of the implementing type and not the protocol. As a result, signatures of methods that consume them don't need to be parameterized on all the types as well. In addition, all of this is done with static dispatch instead of dynamic dispatch.