dotnet / csharplang

The official repo for the design of the C# programming language
11.1k stars 1.01k forks source link

[Proposal]: Practical existential types for interfaces #5556

Open agocke opened 2 years ago

agocke commented 2 years ago

Practical existential types for interfaces

Terminology note: What's called "existential types" in this proposal is more correctly called "associated types" in other languages. The previous proposal, referenced below, did not have the same restrictions and thus implemented something much closer to a pure existential type without type erasure. This proposal is more restrictive and doesn't support using interfaces with associated types in all locations. You can think of every mention of "existential type" in this proposal as meaning "associated type."

Intro

Previously I proposed adding some form of existential types to C#. In that proposal, I describe existential types at a high level and describe how some features could be added to C#. However, I didn't propose any particular implementation strategy and left many questions open, including the mechanism for enforcing type safety.

In this proposal I will describe a full implementation and type safety mechanism, as well as a metadata representation.

To start, the syntactic presentation. I propose that existential types be represented as abstract type members of interfaces. For example,

interface Iface
{
    // Existential type
    type Ext;

    public Ext P { get; }
}

The existential type is substituted when the interface is implemented, e.g.

class C : Iface
{
    type Iface.Ext = int;

    public int P => 0;
}

This syntax is similar to other languages with existential types and provides a clear separation from existing type parameters on types.

This syntax also emphasizes the differences described in the earlier proposal: unlike regular type parameters, which are provided by the creator of a type, existential types are provided by the implementer of the interface and are invisible at type creation.

Note: Some alternate syntaxes include

  1. interface IFace<protected Ext> (this was the syntax I used in my first proposal
  2. interface IFace<abstract Ext>
  3. interface IFace<implicit Ext>
  4. interface IFace|<Ext>

This does raise the question left open in the previous proposal: how to define type equality. Since each interface implementation may have a unique subsitution for the existential type, type equality depends on the exact type of the implementation. Notably, the interface itself is not an implementation, so the following would not type check:

void M(Iface left, Iface right)
{
    var x = left.P;
    x = right.P; // error, left.P and right.P may not be the same type
}

Worse, the type of x is difficult to express in the language, as-is. It is in some sense a type parameter, but there isn't a named type parameter in scope to use to refer to it. Inside the interface we call it Iface.Ext, but this is not actually a type, it is a type parameter. The actual type is whatever was substituted by the implementation. In the case of our example C above, the type is int.

However, we can improve the power of the feature using a different feature: existing C# generics. If we avoid using the type parameter as a type, and instead use it as a constraint, things get simpler:

void M<T>(T left, T right)
    where T : Iface
{
    var x = left.P; // `var` could be type `T.Ext`
    x = right.P; // type checks
}

With this usage, we can be confident that the implementations will produce "compatible" types. This leads to the following proposal: interfaces with type members should only be usable as generic constraints. With this restriction, we can treat type members as relatively standard C# types, usable in the places where type parameters would be permitted. That this is type safe may not be obvious, but the proposed reduction to existing .NET metadata will verify that the resulting code is type safe.

Motivating example

The examples above demonstrate simple usages, but don't give an example of practical advantages. One opportunity is improved optimizations. Consider LINQ. As Jared Parsons described in a blog post, two of the biggest weaknesses of IEnumerable<T> are the repeated interface dispatches, and the abstraction of the enumerator type. As he describes, we could improve the pattern using generics:

public interface IFastEnum<TElement, TEnumerator>
{
  TEnumerator Start { get; }
  bool TryGetNext(ref TEnumerator enumerator, out TElement value);
}

One big problem with this pattern is it makes the enumerator into either an additional type parameter which needs to be manually propagated, or public surface area. This is a job much better left to the compiler. This is how it could be written with existential types:

public interface IFastEnum<TElement>
{
    type TEnumerator;
    TEnumerator Start { get; }
    bool TryGetNext(ref TEnumerator enumerator, out TElement value);
}

The enumerator type is now appropriately elided for everyone except the implementor. A user might write

void M<TEnum>(TEnum e) where TEnum : IFastEnum<string>
{
    foreach (var elem in e)
    {
        Console.WriteLine(elem);
    }
}

This is more verbose than not using generics, but that is a more general concern about verbosity of generics.

And on the implementor side, it would look like this:

class List<T> : IFastEnum<TElement>
{
    type IFastEnum.TEnumerator = int;
    int Start => 0;
    public bool TryGetNext(ref int enumerator, out TElement value)
    {
        if (enumerator >= Count) {
            value = default(TElement);
            return false;
        }

        value = _array[enumerator++];
        return true;
    }
}

This is much the same code that Jared wrote, and should provide the same performance benefits.

Compilation

In the previous proposal, I described a lowering strategy based on logic theorems around existential and universal type equivalence. This technique is powerful and flexible, but much more complicated and difficult to implement. The above design has substantial limitations on how existential types can be used, therefore the implementation can be much simpler. If these limitations prove too onerous in the future, some restrictions may be loosened with a more complex compilation strategy.

The proposed compilation strategy is broadly quite simple: turn existential types into hidden generic parameters. This may seem extreme, but note that the language rule for type members requires the interface which contains them to only be used as constraints. This means that compilation may add generic parameters, but it will never make a method generic which wasn't before, and the type parameters will not spread past the introduction of the constraint.

Here's a simple example of the code before and after.

Before:

interface Iface
{
    type Ext : IDisposable;

    Ext P { get; }
}
class A : Iface
{
    type Ext = MemoryStream;

    MemoryStream P => new MemoryStream();
}
void Dispose<T>(T t)
    where T : Iface
{
    t.P.Dispose();
}
Dispose(new A());

After:

interface Iface<Ext>
    where Ext : IDisposable
{
    Ext P { get; }
}
class A : Iface<MemoryStream>
{
    MemoryStream P => new MemoryStream();
}
void Dispose<T, Ext>(T t)
    where T : Iface<Ext>
{
    t.P.Dispose();
}
Dispose<A, MemoryStream>(new A());

The important transformations are:

  1. Type members become additonal type parameters on the interface.
  2. Constraints on type members become constrains on the interface.
  3. Type member assignments in the implementation become type substitutions in the interface implementation.
  4. In all places where type parameters are declared with constraints to interfaces with type members, all type members must be added to the parameter list.
  5. At all callsites with synthesized type parameters, the correct type arguments must be inferred.

Most of the transformations are simple, but the synthesizing and inferring of type parameters may be worth some elaboration.

First, we need to establish what is necessary for inference. To do so, we need to determine which synthesized type parameters "belong" to which type parameter. This should be doable using synthesized attributes or modreqs to point to the "owning" parameter's index.

Once we know the owning parameters and the synthesized parameters, we can temporarily remove the synthesized parameters and perform type inference as currently specified in C#, or use the manual substitutions. Once the substitution is identified, we can identify the substitutions for the synthesized parameters. First, identify the needed interface by walking the constraint list in order. Next, determine the substitutions in the interface implementation. As currently proposed, the syntax only allows for a single implementation of a given interface for a given type. By searching types from most to least derived for the first implementation of the target interface, we can identify the substitutions on the argument. By matching the substitutions of the argument with the synthesized type parameters, synthesized type arguments can be generated.

The process above can be repeated for all type definitions and substitutions.

Consider also banning re-implementation of the same interface across inheritance. It's not a type safety violation (and therefore shouldn't need a runtime check), but it could lead to confusing behavior on which implementation is chosen, especially since it is always fully inferred.

Conclusion

Advantages

Drawbacks

Design Meetings

https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-02-16.md#practical-existential-types-for-interfaces

HaloFour commented 2 years ago

I can see the utility but I really don't like how this causes the source declarations and the metadata to differ so much in what I think would be non-obvious ways. IIRC this was a serious sticking point in the earlier discussions on shapes, that the witness type had to be a generic type parameter, and that the team overall would have preferred that the runtime offer a solution that did not require something like that.

BhaaLseN commented 2 years ago

Could "optional type parameters" solve this? A bit borrowed from C++ templates, not sure how this would be represented in metadata though.

interface Iface<Ext = MemoryStream> where Ext : IDisposable
{
      Ext P { get; }
}
class A : Iface // implicitly: Iface<MemoryStream>
{
    MemoryStream P => new MemoryStream();
}
void Dispose<T>(T t)
    where T : Iface
{
    t.P.Dispose();
}
Dispose(new A());

Or for the fast enumerable case:

public interface IFastEnum<TElement, TEnumerator = int>
{
    TEnumerator Start { get; }
    bool TryGetNext(ref TEnumerator enumerator, out TElement value);
}
class List<TElement> : IFastEnum<TElement> // implicitly: IFastEnum<TElement, int>
{
    int Start => 0;
    public bool TryGetNext(ref int enumerator, out TElement value)
    {
        if (index >= Count) {
            value = default(T);
            return false;
        }

        value = _array[index++];
        return true;
    }
}

// implicit: any IFastEnum<string, int>
void M<TEnum>(TEnum e) where TEnum : IFastEnum<string>
{
    foreach (var elem in e)
    {
        Console.WriteLine(elem);
    }
}
// explicit? it does allow overriding the value, while "type Foo" would not
// would only be called when the implemetation is specialized and Fancy(TM)
// question about overload resolution though.
// and how TEnumerator could be hoisted up to be a type parameter of M.
void M<TEnum>(TEnum e) where TEnum : IFastEnum<string, FancyEnumerator<string>>
{
    foreach (var elem in e)
    {
        Console.WriteLine(elem);
    }
}
iam3yal commented 2 years ago

@BhaaLseN This seems wrong to me Iface<Ext = MemoryStream> where Ext : IDisposable because existential types needs to be declared first so if anything it should be something like this Iface<Ext = MemoryStream> where Ext : type, IDisposable otherwise it's just a regular generic type parameter and to be honest it looks weird to me.

I think that "optional type parameters" is a separated feature that doesn't really solve the problem here but enhances generics that the proposed solution take advantage of and might benefit from the enhancement as well should it ever added to the language.

TahirAhmadov commented 2 years ago

I'm confused by the FastEnum example. First of all, the index should be enumerator and TElement should be T. But those typos aside, how is that better than just creating a new:

interface IFastEnumerator<T>
{
  bool TryMoveNext(out T item);
}
interface IFastEnumerable<T>
{
  IFastEnumerator<T> GetFastEnumerator();
}
class List<T> : IFastEnumerable<T>
{
  class FastEnumerator : IFastEnumerator<T>
  {
    public FastEnumerator(List<T> list) => this._list = list;
    int _index;
    List<T> _list;
    // or
    T[] _array; int _size; // to get rid of the indirection
    public bool TryMoveNext(out T item)
    {
      // ...
    }
  }
  public IFastEnumerator<T> GetFastEnumerator() => new FastEnumerator(this);
}
foreach(var x in list) { ... }

C# is expanded to look for GetFastEnumerator and use it accordingly. The only downside of this approach is one initial allocation of a class, which has the list reference (8 bytes) and the counter (4 bytes) = 12 bytes. Are we sure it's worth it to introduce a whole new concept of "sort of generics" to save allocating 12 bytes once per loop? And doesn't the performance cost of extra generic arguments (in terms of runtime) negate the benefit of saving that allocation? Also, with the example in the OP, it has to repetitively put the ref index on the stack - on top of invoking an instance method using v-table. With my example, the instance invocation is still there, but the ref index push isn't. Maybe I'm wrong but it seems my example is faster.

0x0737 commented 2 years ago

What about abstract classes? And hope it eventually will be a runtime feature... Compiler "tricks" like that always remind me of java's type erasure...

CyrusNajmabadi commented 2 years ago

At all callsites with synthesized type parameters, the correct type arguments must be inferred.

I wonder if there's space here to do somethign akin to how VB works with extension methods and type parameters. Where tehre can be the 'original def' with some number, and the 'reduced' version with less. Except that here it woudl be somewhat the reverse. I'm just speaking at the Language level how we might represent this, while the emitted level could def look different.

TahirAhmadov commented 2 years ago

Improved version of my code sample:

interface IFastEnumerator<T>
{
  bool TryMoveNext(out T item);
}
interface IFastEnumerable<T>
{
  Type GetFastEnumeratorType(); // return type which has a ctor accepting one parameter, the collection; and has TryMoveNext method
}
class List<T> : IFastEnumerable<T>
{
  struct FastEnumerator : IFastEnumerator<T>
  {
    public FastEnumerator(List<T> list) => this._list = list;
    int _index;
    List<T> _list;
    // or
    T[] _array; int _size; // to get rid of the indirection
    public bool TryMoveNext(out T item)
    {
      // ...
    }
  }
  public Type GetFastEnumeratorType() => typeof(FastEnumerator);
}
foreach(var x in list) { ... }
// becomes
var enumerator = new List<T>.FastEnumerator(list); // look for ctor which takes the list as a parameter
// the enumerator now lives on the stack
while(enumerator.TryGetNext(out readonly x)) // this call can be inlined, virtually removing any overhead
{
}

And the best part, all this is possible today, with minor changes to the C# compiler (recognize new types/method names/etc.)

333fred commented 2 years ago

Improved version of my code sample:

interface IFastEnumerator<T>
{
  bool TryMoveNext(out T item);
}
interface IFastEnumerable<T>
{
  Type GetFastEnumeratorType(); // return type which has a ctor accepting one parameter, the collection; and has TryMoveNext method
}
class List<T> : IFastEnumerable<T>
{
  struct FastEnumerator : IFastEnumerator<T>
  {
    public FastEnumerator(List<T> list) => this._list = list;
    int _index;
    List<T> _list;
    // or
    T[] _array; int _size; // to get rid of the indirection
    public bool TryMoveNext(out T item)
    {
      // ...
    }
  }
  public Type GetFastEnumeratorType() => typeof(FastEnumerator);
}
foreach(var x in list) { ... }
// becomes
var enumerator = new List<T>.FastEnumerator(list); // look for ctor which takes the list as a parameter
// the enumerator now lives on the stack
while(enumerator.TryGetNext(out readonly x)) // this call can be inlined, virtually removing any overhead
{
}

And the best part, all this is possible today, with minor changes to the C# compiler (recognize new types/method names/etc.)

That Enumerator type is not something that the compiler can determine at compile time, meaning it would have to be reflection. Associated types is how you achieve your interface proposal with actual compile time knowledge.

TahirAhmadov commented 2 years ago

How about:

interface IFastEnumerator<T>
{
  bool TryMoveNext(out T item);
}
interface IFastEnumerable<T>
{
  IFastEnumerator<T> GetFastEnumerator(); // this is for regular usage
}
class List<T> : IFastEnumerable<T>
{
  public struct FastEnumerator : IFastEnumerator<T> // this can be public - who cares
  {
    public FastEnumerator(List<T> list) => this._list = list;
    int _index;
    List<T> _list;
    // or
    T[] _array; int _size; // to get rid of the indirection
    public bool TryMoveNext(out T item)
    {
      // ...
    }
  }
  public IFastEnumerator<T> GetFastEnumerator() => new FastEnumerator(this); // this is for regular usage
  public FastEnumerator GetFastEnumeratorDirect() => new FastEnumerator(this); // this is for highest performance
}
foreach(var x in list) { ... }
// becomes
var enumerator = list.GetFastEnumeratorDirect(); // C# looks for a method named GetFastEnumeratorDirect
// the enumerator now lives on the stack
while(enumerator.TryGetNext(out readonly x)) // this call can be inlined, virtually removing any overhead
{
}
333fred commented 2 years ago

How about:

interface IFastEnumerator<T>
{
  bool TryMoveNext(out T item);
}
interface IFastEnumerable<T>
{
  IFastEnumerator<T> GetFastEnumerator(); // this is for regular usage
}
class List<T> : IFastEnumerable<T>
{
  public struct FastEnumerator : IFastEnumerator<T> // this can be public - who cares
  {
    public FastEnumerator(List<T> list) => this._list = list;
    int _index;
    List<T> _list;
    // or
    T[] _array; int _size; // to get rid of the indirection
    public bool TryMoveNext(out T item)
    {
      // ...
    }
  }
  public IFastEnumerator<T> GetFastEnumerator() => new FastEnumerator(this); // this is for regular usage
  public FastEnumerator GetFastEnumeratorDirect() => new FastEnumerator(this); // this is for highest performance
}
foreach(var x in list) { ... }
// becomes
var enumerator = list.GetFastEnumeratorDirect(); // C# looks for a method named GetFastEnumeratorDirect
// the enumerator now lives on the stack
while(enumerator.TryGetNext(out readonly x)) // this call can be inlined, virtually removing any overhead
{
}

That's pretty much what we have today, with the same performance issues in fully generic code.

TahirAhmadov commented 2 years ago

That's pretty much what we have today, with the same performance issues in fully generic code.

Oh yea, I didn't even realize it's there. Given that's already there, what are the performance issues? How can that be improved any further? (Other than getting rid of the MoveNext/Current and replacing it with TryGetNext) What's the drive for "existential types"?

CyrusNajmabadi commented 2 years ago

the problem is when you are workign with IFastEnumerable. The compiler doesn't know about GetFastEnumeratorDirect to do this quickly. With existential types, it could.

TahirAhmadov commented 2 years ago

the problem is when you are workign with IFastEnumerable. The compiler doesn't know about GetFastEnumeratorDirect to do this quickly. With existential types, it could.

I'm completely confused now. The compiler already looks for GetEnumerator which returns the public struct enumerator, not the interface (per @333fred and List<T> definition).

CyrusNajmabadi commented 2 years ago

I'm completely confused now. The compiler already looks for GetEnumerator which returns the public struct enumerator

It can do that when workign with the concrete type. But if you're workign with the interface it has no idea. That's what existential types aims to solve. You can both be working with the interface, but pass along enough information that the compielr/jit can statically know all the information necessary to generate more optimal code (namely code that doesn't need to call through virtual-dispatch or allocate on the heap).

CyrusNajmabadi commented 2 years ago

Using Andy's example, you could pass around an IFastEnum, but get the same perf as if you passed around a concrete impl of that interface.

TahirAhmadov commented 2 years ago

I'm completely confused now. The compiler already looks for GetEnumerator which returns the public struct enumerator

It can do that when workign with the concrete type. But if you're workign with the interface it has no idea. That's what existential types aims to solve. You can both be working with the interface, but pass along enough information that the compielr/jit can statically know all the information necessary to generate more optimal code (namely code that doesn't need to call through virtual-dispatch or allocate on the heap).

Oh I see what you mean. Still, is it really that important to get rid of one allocation per entire loop? My first example - basically adding TryGetNext and leaving everything else as is - works just as fast per iteration. Or am I missing something else?

CyrusNajmabadi commented 2 years ago

Still, is it really that important to get rid of one allocation per entire loop?

Yes.

CyrusNajmabadi commented 2 years ago

My first example - basically adding TryGetNext and leaving everything else as is - works just as fast per iteration.

Again, this is what we have today :)

However, you only get the benefits if you know the concrete types. But that's highly problematic. In many domains you do not. And now there are tons of allocations happening as you have to work over abstract interfaces with no additional information present to help the runtime/jit/etc. avoid them.

TahirAhmadov commented 2 years ago

@CyrusNajmabadi @333fred OK I just ran performance testing, enumerating 1,000,000 items, 100 times to get an average, and the results are: (.NET 5)

.NET FW 4.8:

In .NET 5, the first 2 approaches (ref int and interface) sometimes switch places; to be completely fair, I think they perform equally. In .NET FW 4.8, the 2nd (interface) approach is always faster. Code: https://pastebin.com/UA1N1KCG

CyrusNajmabadi commented 2 years ago

@TahirAhmadov please show a BenchmarkDotnet result, including memory allocations. FUrthermore, this only shows TryGetNext appraoch which is not extendible to all the cases where existential types may be used.

HaloFour commented 2 years ago

This proposal is also about the ergonomics of generic types, not performance. Those scenarios where performance benefits can be achieved but currently necessitate painful generic signatures are just a bonus.

CyrusNajmabadi commented 2 years ago

My BDN results:

|           Method |       Mean |   Error |  StdDev | Ratio | Allocated |
|----------------- |-----------:|--------:|--------:|------:|----------:|
|  IFastEnumerable | 1,647.0 ns | 4.52 ns | 4.23 ns |  1.00 |         - |
|                  |            |         |         |       |           |
| IFastEnumerable2 | 2,269.2 ns | 6.86 ns | 6.41 ns |  1.00 |      32 B |
|                  |            |         |         |       |           |
|  FastEnumerator2 |   347.1 ns | 0.71 ns | 0.66 ns |  1.00 |         - |

Gist: https://gist.github.com/CyrusNajmabadi/9a1acd7f5c2f2b63c194870533e6af62

So anywhere from 5-7x faster (not even counting things like allocations).

TahirAhmadov commented 2 years ago

@TahirAhmadov please show a BenchmarkDotnet result, including memory allocations. FUrthermore, this only shows TryGetNext appraoch which is not extendible to all the cases where existential types may be used.

The code is self explanatory, I don't have the time now to make a new project with that BenchmarkDotnet. I do agree that the allocation will add some memory consumption, but again, we're talking about something like 20 bytes; this only becomes a problem if the loop is executed many times - which is usually a code smell for a whole bunch of reasons, not only allocating the enumerator object. In short, I think there may be a miniscule performance benefit to squeeze out of this improvement; but the cost of creating a whole new syntax constructs plus necessary runtime changes just doesn't seem to be worth it right now. Again - I'm going off of what's in the OP. They show the motivational example and that's what I can work with. I can't sit here and invent other ways to use a feature which I'm not the one proposing.

CyrusNajmabadi commented 2 years ago

The code is self explanatory

The code is not acceptable as it does hand written measurements without any of the efforts or smarts that BDN puts into reliable, repeatable, and statistically significant measurements.

but again, we're talking about something like 20 bytes; this only becomes a problem if the loop is executed many times

No it's not. That's the common case. Components are producing enumerables and other parts of hte system are consuming it.

I think there may be a miniscule performance benefit to squeeze out of this improvement;

You are incorrect. Even in the basic examples you showed it's 5-7x, and no allocs.

TahirAhmadov commented 2 years ago

My BDN results:

|           Method |       Mean |   Error |  StdDev | Ratio |
|----------------- |-----------:|--------:|--------:|------:|-
|  IFastEnumerable | 1,647.0 ns | 4.52 ns | 4.23 ns |  1.00 |
|                  |            |         |         |       |
| IFastEnumerable2 | 2,269.2 ns | 6.86 ns | 6.41 ns |  1.00 |
|                  |            |         |         |       |
|  FastEnumerator2 |   347.1 ns | 0.71 ns | 0.66 ns |  1.00 |

Gist: https://gist.github.com/CyrusNajmabadi/9a1acd7f5c2f2b63c194870533e6af62

So anywhere from 5-7x faster (not even counting things like allocations).

The FastEnumerator2 is the concrete type struct approach - I just included it for comparison sake. However your results are interesting. The existential type approach does seem to be about 27% faster. I'll look into it further later.

CyrusNajmabadi commented 2 years ago

The existential type approach does seem to be about 27% faster. I'll look into it further later.

No. The existential type approach would allow for the speed of FastEnumerator2 as there will be no virtual dispatch. The jit will literally know that 'start' is an int, and that there is a not-virt call to:

    public bool TryGetNext(ref int enumerator, out TElement value)
    {
        if (index >= Count) {
            value = default(T);
            return false;
        }

        value = _array[index++];
        return true;
    }

which will likely get entire inlined and optimized.

TahirAhmadov commented 2 years ago

The existential type approach does seem to be about 27% faster. I'll look into it further later.

No. The existential type approach would allow for the speed of FastEnumerator2 as there will be no virtual dispatch.

How is that? You said yourself the purpose is to allow passing an interface. How are you going to avoid a virtual dispatch with an interface?

TahirAhmadov commented 2 years ago

The jit will literally know that 'start' is an int, and that there is a not-virt call to:

    public bool TryGetNext(ref int enumerator, out TElement value)
    {
        if (index >= Count) {
            value = default(T);
            return false;
        }

        value = _array[index++];
        return true;
    }

which will likely get entire inlined and optimized.

Isn't the same true for the FastEnumerator2 interface method? It can also be inlined and optimized.

CyrusNajmabadi commented 2 years ago

How is that? You said yourself the purpose is to allow passing an interface. How are you going to avoid a virtual dispatch with an interface?

That's literally what existential types allow. :)

This is the point of hte proposal. Please see the translation provided in the OP and how it allows generics to make this work.

CyrusNajmabadi commented 2 years ago

Isn't the same true for the FastEnumerator2 interface method? It can also be inlined and optimized.

No :) Which is why you see the performance in the BDN of the interface dispatch approach being a problem. THe only existing approach we have that is fast on the current system is to return the FastEnumerator2 struct, which requires both teh production and consumption code to know about this concrete type.

Again, that's the point here. There's no way to accomplish that today ergonomically with interfaces. THere are unergonomic ways (with extra type parameters). Existential types is trying to get the best of both worlds. The perf and runtime wins that we get with extra type parameters today, but with the convenience and ease of a more ergonomic solution that doesn't require passing around all that information in all signatures.

TahirAhmadov commented 2 years ago

No :) Which is why you see the performance in the BDN of the interface dispatch approach being a problem. THe only existing approach we have that is fast on the current system is to return the FastEnumerator2 struct, which requires both teh production and consumption code to know about this concrete type.

Again, that's the point here. There's no way to accomplish that today ergonomically with interfaces. THere are unergonomic ways (with extra type parameters). Existential types is trying to get the best of both worlds. The perf and runtime wins that we get with extra type parameters today, but with the convenience and ease of a more ergonomic solution that doesn't require passing around all that information in all signatures.

OK just to make sure we're all on the same page: is it possible today to write code in C# which would resemble the IL output that existential types would produce? If so, how is it different from my IFastEnumerator approach? (I thought my IFastEnumerator was very similar to the OP example.) I'd like to take that and stick it into a perf comp.

CyrusNajmabadi commented 2 years ago

OK just to make sure we're all on the same page: is it possible today to write code in C# which would resemble the IL output that existential types would produce?

Yes. See andy's translation of what his existential types compile down to.

The problem is that it's very unergonomic. Requiring extra type parameters all over the place and requiring many cases for the user to supply the type parameter explicitly instead of allowing for inference. The point of this proposal is to have a language feature that makes this pleasant.

I'd like to take that and stick it into a perf comp.

I don't know what this means.

As above, the point of this is to allow teh speed of the last line here, while also having a clean and easy to use model that doesn't require consumer and producer to explicitly state/infer/type-parameterize the concrete type necessary to do this:

image

At the end, it will compile down to something very similar. But that doesn't mean it has to be declared or consumed in that more painful fashion.

TahirAhmadov commented 2 years ago

OK just to make sure we're all on the same page: is it possible today to write code in C# which would resemble the IL output that existential types would produce?

Yes. See andy's translation of what his existential types compile down to.

The problem is that it's very unergonomic. Requiring extra type parameters all over the place and requiring many cases for the user to supply the type parameter explicitly instead of allowing for inference. The point of this proposal is to have a language feature that makes this pleasant.

Right, and it's my IFastEnumerator which is a replica of the OP, not the struct approach. Therefore, the difference is 27%, not 5x (which is still good).

I'd like to take that and stick it into a perf comp.

I don't know what this means.

I was hoping if you (or the author) can provide the code which I can benchmark.

As above, the point of this is to allow teh speed of the last line here,

I just don't see how that's possible. That high performance is enabled by a struct living on the stack and method being inlined - because it's a call, not a callvirt.

CyrusNajmabadi commented 2 years ago

Right, and it's my IFastEnumerator which is a replica of the OP, not the struct approach. Therefore, the difference is 27%, not 5x (which is still good).

No it's not. The OP is akin to the FastEnumerator2 line.

I just don't see how that's possible. That high performance is enabled by a struct living on the stack and method being inlined - because it's a call, not a callvirt.

Because in our translation it's not a callvirt. It's a constrained-call through a generic type parameter. Because that type parameter will be instantiated with a struct at runtime, the jit can specialize to a direct call.

CyrusNajmabadi commented 2 years ago

@TahirAhmadov it's through a translation like:

void M<TEnum>(TEnum e) where TEnum : IFastEnum<string>
{
    foreach (var elem in e)
    {
        Console.WriteLine(elem);
    }
}

It's not going through a callvirt itself through. It's going to go through constrained calls that the jit can replace with direct struct placement/invocation at runtime.

BhaaLseN commented 2 years ago

This seems wrong to me Iface<Ext = MemoryStream> where Ext : IDisposable because existential types needs to be declared first so if anything it should be something like this Iface<Ext = MemoryStream> where Ext : type, IDisposable otherwise it's just a regular generic type parameter and to be honest it looks weird to me.

Just throwng stuff at the wall to see what sticks; but yes, it was meant to look like a type parameter (because in my head, it essentially just is one). I have to admit though, after re-catching up on existential types I believe my brain took a wrong turn somewhere and thought it was something else. I do wonder if : type as a constraint (or any other keyword as constraint, further back in the declaration) expresses this appropriately. How about using interface Iface<implicit Ext> where Ext : IDisposable instead (with the existing implicit keyword denoting extential types in the signature)?

I think that "optional type parameters" is a separated feature that doesn't really solve the problem here but enhances generics that the proposed solution take advantage of and might benefit from the enhancement as well should it ever added to the language.

Possibly; again, just throwing ideas. Doesn't necessarily mean its a good or applicable one for the stated problem. (and the more I think about it while typing this, the more I tend to agree)

CyrusNajmabadi commented 2 years ago

How about using interface Iface where Ext : IDisposable instead (with the existing implicit keyword denoting extential types in the signature)?

In general, while syntax is certainly very important, we like to hammer that down further down the line. The core goal here would be around the idea of the semantics of such an entity, and how that can be then consumed. If we like it, we'll then do a few rounds of syntactic options seeing which one feels 'c#-iest'. IN this case though, the essential part is having this be distinct somehow from a normal type parameter with all the rules/semantics that that has.

So your feedback is very well taken. But we shouldn't overindex on that aspect of the design too early. Thanks!

agocke commented 2 years ago

Yeah, to be clear, I chose a syntax that I thought would make the transformation easier to understand. I won't claim that it's the final syntax that we should use for the language.

@TahirAhmadov I think what you're missing is that the code that we're trying to optimize must be working on the interface, not the concrete type, i.e.

void M<T>(IEnumerable<T> e)

or

void M<TEnum, T>(TEnum e) where TEnum : IEnumerable<T>

This code allows any IEnumerable to come in. The task is to make this code as fast as though the parameter had the static type List<T>, but without having the method contain List<T> at all.

agocke commented 2 years ago

@HaloFour Yup, the differing arity between metadata and source is a drawback.

However, when thinking about possible metadata representation, it occurred to me that the changes required at the metadata level could be much more disruptive than a clever rewriting technique. At least every metadata tool has an established way of understanding generic parameters and compilers without smarts would be allow users to substitute manually. The metadata revisions for "first-class" existential types here could be very intrusive and much more disruptive to the ecosystem.

HaloFour commented 2 years ago

@agocke

Could the same be achieved through the proposals around partial generic inference? You'd still have to write the generic types/methods in the same way you do today (which might be a good thing since you don't have to work out a new syntax to try to carry the existential type) but the call sites would be a lot cleaner. That would keep the metadata and source more close in parity and probably be a lot less involved from a syntax point of view.

Or maybe make the inference more explicit through a syntax to establish the relationship between the generic type parameters.

Throwing spaghetti at the wall here.

agocke commented 2 years ago

I'm not very familiar with the other proposals, but let me give some more examples and description to lay out more of the design space.

Here's a place in my serde-dn library that would use existential types:

public interface ISerializer<
    TSerializeType,
    TSerializeEnumerable,
    TSerializeDictionary
    >
    where TSerializeType : ISerializeType
    where TSerializeEnumerable : ISerializeEnumerable
    where TSerializeDictionary : ISerializeDictionary
{
    ...
    TSerializeType SerializeType(string name, int numFields);
    TSerializeEnumerable SerializeEnumerable(int? length);
    TSerializeDictionary SerializeDictionary(int? length);
}

Once again, the purpose of the type parameters here is performance -- with generics the return types can be structs and avoid allocation. The .NET runtime can also specialize for the particular implementation.

With this proposal, these would be existential types, not type parameters. The difference is particularly notable in a use site. The current definition:

public interface ISerialize
{
    void Serialize<TSerializer, TSerializeType, TSerializeEnumerable, TSerializeDictionary>(ref TSerializer serializer)
        where TSerializeType : ISerializeType
        where TSerializeEnumerable : ISerializeEnumerable
        where TSerializeDictionary : ISerializeDictionary
        where TSerializer : ISerializer<TSerializeType, TSerializeEnumerable, TSerializeDictionary>;
}

Not pretty. The saving grace for my library is that these interfaces will almost always be used and implemented by source generators, so users don't have to care about the complexity.

With existential types, however, the above looks a lot more reasonable:

public interface ISerialize
{
    void Serialize<TSerializer>(ref TSerializer serializer) where TSerializer : ISerializer;
}

The point of this example is to show that having multiple existential types in an interface could be common, that the burden at the use site increases heavily with more parameters, and that the most important ergonomic consideration here is hiding the parameters at the use site. Any design which has the parameters specified at the use site is a failure in my opinion, as it can rapidly spiral out of control.

That said, I think there's room to iterate at the declaration site. One alternative would be to move more towards the type parameters I described in the earlier proposal

public interface ISerializer<
    abstract TSerializeType,
    abstract TSerializeEnumerable,
    abstract TSerializeDictionary>
where TSerializeType : ISerializeType
where TSerializeEnumerable : ISerializeEnumerable
where TSerializeDictionary : ISerializeDictionary
{
...
}

On the plus side, the definition here has the appropriate number of type parameters. On the minus side, expressing the types in code becomes more difficult. I previously proposed that you could refer to existential types as "dotted names", e.g. TSerializer.TSerializeType. The dotted name syntax looks a lot stranger with type parameters instead of type members.

HaloFour commented 2 years ago

@agocke

and that the most important ergonomic consideration here is hiding the parameters at the use site. Any design which has the parameters specified at the use site is a failure in my opinion, as it can rapidly spiral out of control.

I agree. Several of those other parameters look to achieve the same thing by making the compiler "smarter" about inferring the generic type arguments based on their relationship to each other and to any passed parameters. I think https://github.com/dotnet/csharplang/discussions/997 is a good example and there are links to several others from that discussion.

That said, I think there's room to iterate at the declaration site.

That's the only sticking point to me, I'd prefer that the declaration more accurately align with the type in metadata. Otherwise I fear that you'd get stuck in some weird situations that would be challenging to resolve in the language, e.g.:

// Is this one type, or two?

public partial class Foo<T> {
    type Ext;

   // stuff
}

public partial class Foo<T> {
    // other stuff
}
agocke commented 2 years ago

Yup, the risk of confusion when "overloading" is real. Also, the parameter order is significant in metadata, so reordering or introducing new intermediate type members is a breaking change. Nevertheless, I quite liked the member style for type naming.

ufcpp commented 2 years ago

In this proposed implementation, can we write a cast like the following?

void M<TElement>(object x)
{
    if (x is IFastEnum<TElement> fast)
    {
        var start = fast.Start;
        ...
    }
    else if (x is IEnumerator<TElement> slow)
    {
        var e = slow.GetEnumerator();
        ...
    }
}
agocke commented 2 years ago

@ufcpp No. Remember: the restriction is that interfaces with existentials can only be used as constraints. The right hand of the is operator is a type position, not a constraint position, so this would be an error.

ufcpp commented 2 years ago

It's fair enough but a little dissapointing.

xoofx commented 2 years ago

I like the proposal a lot as much as it would help to solve so many struggles we often have to deal with.

Syntax wise, it could be something like this:

public interface IFace|<Ext> {
    Ext P { get; }
}

public class c : IFace|<int> {
    public int P => 0;
}

The character | allows to differentiate explicit generics (declared before) to implicit generics (declared after).

xoofx commented 2 years ago

Wondering also if for the generic type constraint we should not include the | or make it even more explicit |<...>

void M<T>(T left, T right)
    where T : Iface|<...>
{
    var x = left.P; // `var` could be type `T.Ext`
    x = right.P; // type checks
}

This would help to differentiate an interface without implicit generics (e.g interface IFace {} from one with interface IFace|<Ext> {} while it would restrict that you can have only one implicit variant for an interface (e.g cannot declare interface IFace|<Ext1, Ext2> if IFace|<Ext> was declared)

ocoanet commented 2 years ago

@xoofx I like this idea. Yet, I suppose that it would not be possible to have both IFace|<Ext> and IFace<Ext> types, so maybe a generic modifier (e.g.: in or out) would be better. I do not have a good suggestion for this, protected might indicate that the type parameter is not accessible outside of the type, but it would be strange because there would not be any public or private options.

interface Iface<protected Ext>
{
    public Ext P { get; }
}

Also, "protected type parameters" would be less scary than "existential types" for mere mortals like me.

TIHan commented 2 years ago

it would help to solve so many struggles we often have to deal with.

@xoofx , is it just IEnumerable struggles or others? I'm curious of what other examples there are.