dotnet / csharplang

The official repo for the design of the C# programming language
11.45k stars 1.02k forks source link

[Proposal]: Practical existential types for interfaces #5556

Open agocke opened 2 years ago

agocke commented 2 years ago

Practical existential types for interfaces

Terminology note: What's called "existential types" in this proposal is more correctly called "associated types" in other languages. The previous proposal, referenced below, did not have the same restrictions and thus implemented something much closer to a pure existential type without type erasure. This proposal is more restrictive and doesn't support using interfaces with associated types in all locations. You can think of every mention of "existential type" in this proposal as meaning "associated type."

Intro

Previously I proposed adding some form of existential types to C#. In that proposal, I describe existential types at a high level and describe how some features could be added to C#. However, I didn't propose any particular implementation strategy and left many questions open, including the mechanism for enforcing type safety.

In this proposal I will describe a full implementation and type safety mechanism, as well as a metadata representation.

To start, the syntactic presentation. I propose that existential types be represented as abstract type members of interfaces. For example,

interface Iface
{
    // Existential type
    type Ext;

    public Ext P { get; }
}

The existential type is substituted when the interface is implemented, e.g.

class C : Iface
{
    type Iface.Ext = int;

    public int P => 0;
}

This syntax is similar to other languages with existential types and provides a clear separation from existing type parameters on types.

This syntax also emphasizes the differences described in the earlier proposal: unlike regular type parameters, which are provided by the creator of a type, existential types are provided by the implementer of the interface and are invisible at type creation.

Note: Some alternate syntaxes include

  1. interface IFace<protected Ext> (this was the syntax I used in my first proposal
  2. interface IFace<abstract Ext>
  3. interface IFace<implicit Ext>
  4. interface IFace|<Ext>

This does raise the question left open in the previous proposal: how to define type equality. Since each interface implementation may have a unique subsitution for the existential type, type equality depends on the exact type of the implementation. Notably, the interface itself is not an implementation, so the following would not type check:

void M(Iface left, Iface right)
{
    var x = left.P;
    x = right.P; // error, left.P and right.P may not be the same type
}

Worse, the type of x is difficult to express in the language, as-is. It is in some sense a type parameter, but there isn't a named type parameter in scope to use to refer to it. Inside the interface we call it Iface.Ext, but this is not actually a type, it is a type parameter. The actual type is whatever was substituted by the implementation. In the case of our example C above, the type is int.

However, we can improve the power of the feature using a different feature: existing C# generics. If we avoid using the type parameter as a type, and instead use it as a constraint, things get simpler:

void M<T>(T left, T right)
    where T : Iface
{
    var x = left.P; // `var` could be type `T.Ext`
    x = right.P; // type checks
}

With this usage, we can be confident that the implementations will produce "compatible" types. This leads to the following proposal: interfaces with type members should only be usable as generic constraints. With this restriction, we can treat type members as relatively standard C# types, usable in the places where type parameters would be permitted. That this is type safe may not be obvious, but the proposed reduction to existing .NET metadata will verify that the resulting code is type safe.

Motivating example

The examples above demonstrate simple usages, but don't give an example of practical advantages. One opportunity is improved optimizations. Consider LINQ. As Jared Parsons described in a blog post, two of the biggest weaknesses of IEnumerable<T> are the repeated interface dispatches, and the abstraction of the enumerator type. As he describes, we could improve the pattern using generics:

public interface IFastEnum<TElement, TEnumerator>
{
  TEnumerator Start { get; }
  bool TryGetNext(ref TEnumerator enumerator, out TElement value);
}

One big problem with this pattern is it makes the enumerator into either an additional type parameter which needs to be manually propagated, or public surface area. This is a job much better left to the compiler. This is how it could be written with existential types:

public interface IFastEnum<TElement>
{
    type TEnumerator;
    TEnumerator Start { get; }
    bool TryGetNext(ref TEnumerator enumerator, out TElement value);
}

The enumerator type is now appropriately elided for everyone except the implementor. A user might write

void M<TEnum>(TEnum e) where TEnum : IFastEnum<string>
{
    foreach (var elem in e)
    {
        Console.WriteLine(elem);
    }
}

This is more verbose than not using generics, but that is a more general concern about verbosity of generics.

And on the implementor side, it would look like this:

class List<T> : IFastEnum<TElement>
{
    type IFastEnum.TEnumerator = int;
    int Start => 0;
    public bool TryGetNext(ref int enumerator, out TElement value)
    {
        if (enumerator >= Count) {
            value = default(TElement);
            return false;
        }

        value = _array[enumerator++];
        return true;
    }
}

This is much the same code that Jared wrote, and should provide the same performance benefits.

Compilation

In the previous proposal, I described a lowering strategy based on logic theorems around existential and universal type equivalence. This technique is powerful and flexible, but much more complicated and difficult to implement. The above design has substantial limitations on how existential types can be used, therefore the implementation can be much simpler. If these limitations prove too onerous in the future, some restrictions may be loosened with a more complex compilation strategy.

The proposed compilation strategy is broadly quite simple: turn existential types into hidden generic parameters. This may seem extreme, but note that the language rule for type members requires the interface which contains them to only be used as constraints. This means that compilation may add generic parameters, but it will never make a method generic which wasn't before, and the type parameters will not spread past the introduction of the constraint.

Here's a simple example of the code before and after.

Before:

interface Iface
{
    type Ext : IDisposable;

    Ext P { get; }
}
class A : Iface
{
    type Ext = MemoryStream;

    MemoryStream P => new MemoryStream();
}
void Dispose<T>(T t)
    where T : Iface
{
    t.P.Dispose();
}
Dispose(new A());

After:

interface Iface<Ext>
    where Ext : IDisposable
{
    Ext P { get; }
}
class A : Iface<MemoryStream>
{
    MemoryStream P => new MemoryStream();
}
void Dispose<T, Ext>(T t)
    where T : Iface<Ext>
{
    t.P.Dispose();
}
Dispose<A, MemoryStream>(new A());

The important transformations are:

  1. Type members become additonal type parameters on the interface.
  2. Constraints on type members become constrains on the interface.
  3. Type member assignments in the implementation become type substitutions in the interface implementation.
  4. In all places where type parameters are declared with constraints to interfaces with type members, all type members must be added to the parameter list.
  5. At all callsites with synthesized type parameters, the correct type arguments must be inferred.

Most of the transformations are simple, but the synthesizing and inferring of type parameters may be worth some elaboration.

First, we need to establish what is necessary for inference. To do so, we need to determine which synthesized type parameters "belong" to which type parameter. This should be doable using synthesized attributes or modreqs to point to the "owning" parameter's index.

Once we know the owning parameters and the synthesized parameters, we can temporarily remove the synthesized parameters and perform type inference as currently specified in C#, or use the manual substitutions. Once the substitution is identified, we can identify the substitutions for the synthesized parameters. First, identify the needed interface by walking the constraint list in order. Next, determine the substitutions in the interface implementation. As currently proposed, the syntax only allows for a single implementation of a given interface for a given type. By searching types from most to least derived for the first implementation of the target interface, we can identify the substitutions on the argument. By matching the substitutions of the argument with the synthesized type parameters, synthesized type arguments can be generated.

The process above can be repeated for all type definitions and substitutions.

Consider also banning re-implementation of the same interface across inheritance. It's not a type safety violation (and therefore shouldn't need a runtime check), but it could lead to confusing behavior on which implementation is chosen, especially since it is always fully inferred.

Conclusion

Advantages

Drawbacks

Design Meetings

https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-02-16.md#practical-existential-types-for-interfaces

xoofx commented 2 years ago

@xoofx I like this idea. Yet, I suppose that it would not be possible to have both IFace|<Ext> and IFace<Ext> types, so maybe a generic modifier (e.g.: in or out)

Indeed. I originally wanted to use the keyword implicit, but then you would have to duplicate it a lot if you have multiple implicit generics e.g IFace<TDirect, implicit TImplicit1, implicit TImplicit2> but also it makes declaration of constraints less practical e.g should we use something like IFace<MyImplem,...> or something else with implicit? But anything that could make the declaration compact and familiar is welcome. I agree with the fact that "existential types" could bring more confusion than reusing some existing concepts we have (e.g inferred generics, implicit generics, deferred generics...)

@xoofx , is it just IEnumerable struggles or others? I'm curious of what other examples there are.

Not only. @agocke showed some examples above. Anything that comes with <TOuter, TInner>(TOuter value) where TOuter: IOuterInterface<TInner> where you have then to explicitly pass every single TInner type args that could be actually inferred from TOuter.

jmarolf commented 2 years ago

I really love this proposal.


I am certain that we will yak shave over the syntax plenty before all this is done so I will throw my hat into the ring.

I understand why specifying existential types inside the type itself can make sense. You are not going to ever specify them at any use site it might make more sense to call this out.

public interface IFastEnum<TElement> {
    type TEnumerator;
    TEnumerator Start { get; }
    bool TryGetNext(ref TEnumerator enumerator, out TElement? value);
}

public interface IFastCollec<TElement> : IFastEnum<TElement> {
    int Count { get; }
    bool IsReadOnly { get; }

    void Add(TElement element);
    bool Remove(TElement element);
    bool Contains(TElement element);
    void Clear();
    void CopyTo(TElement[] array, int index = 0);
}

public interface IFastList<TElement> : IFastCollection<TElement> {
    public TElement this[int index] { get; set; }

    int IndexOf(TElement element);
    void Insert(int index, TElement element);
    void RemoveAt(int index);
}

public abstract class AbstractFastList<T> : IFastList<T> {
}

public class FastList<T> : AbstractFastList<T> {
    type AbstractFastList.TEnumerator = int;
    int Start => 0;
    public bool TryGetNext(ref int enumerator, out T value) {
        // cool things happen here
    }
}

My main concern with this is tooling. Today once you finish specifying the type or interface you are inheriting from and all the generic arguments you are done. We now know all the member bodies that you need to implement and more importantly what their types are. The developer flow as this proposal is currently written appears to be:

  1. Specify the types/interfaces we are inheriting from in the type declaration
  2. Fill in all the generic types for these in the type declaration
  3. Fill in the values for any existential types in the type declaration body
  4. Now invoke tooling to implement any missing members

    For this reason, I would prefer it if we keep existential type in the generic argument list. Presumably, they would need to come after normal type arguments and have some keyword to denote their existence. Let go with implicit for now.

public interface IFastEnum<TElement, implicit TEnumerator> {
    TEnumerator Start { get; }
    bool TryGetNext(ref TEnumerator enumerator, out TElement value);
}

public interface IFastCollec<TElement> : IFastEnum<TElement>  {
    int Count { get; }
    bool IsReadOnly { get; }

    void Add(TElement element);
    bool Remove(TElement element);
    bool Contains(TElement element);
    void Clear();
    void CopyTo(TElement[] array, int index = 0);
}

public interface IFastList<TElement> : IFastCollection<TElement> {
    public TElement this[int index] { get; set; }

    int IndexOf(TElement element);
    void Insert(int index, TElement element);
    void RemoveAt(int index);
}

public abstract class AbstractFastList<T> : IFastList<T> {
}

public class FastList<T> : AbstractFastList<T, int> {
    int Start => 0;
    public bool TryGetNext(ref int enumerator, out T value) {
        // cool things happen here
    }
}

So they just become an optional type argument unless the type in non-abstract.

Anyways, syntax yak shaving over.

To re-iterate why this proposal is awesome consider a simple benchmark like this.


BenchmarkDotNet=v0.12.1, OS=macOS 12.1 (21C52) [Darwin 21.2.0]
Apple M1 Max, 1 CPU, 10 logical and 10 physical cores
.NET Core SDK=6.0.100
  [Host]     : .NET Core 6.0.0 (CoreCLR 6.0.21.52210, CoreFX 6.0.21.52210), Arm64 RyuJIT
  DefaultJob : .NET Core 6.0.0 (CoreCLR 6.0.21.52210, CoreFX 6.0.21.52210), Arm64 RyuJIT
Method Mean Error StdDev Ratio Gen 0 Gen 1 Gen 2 Allocated
FastLinq 15.97 ms 0.160 ms 0.150 ms 0.54 20687.5000 - - 41.29 MB
NormalLinq 29.55 ms 0.190 ms 0.158 ms 1.00 33656.2500 - - 67.13 MB

Passing the enumerator type args all the way through gets you a 40% reduction in memory and a makes it nearly twice as fast. Also, consider that the runtime team has done a lot of work to try and speed up linq, yet I can just goof around and build out some simple implementations to get significantly better perf.

Now, how do we accomplish this without bifurcating the world and forcing everyone to use/implement two enumerable apis? I dunno. Honestly, that's a whole other ball a yarn. But this proposal gets us incrementally closer to a place where this sort of thing could be done.

333fred commented 2 years ago

@agocke do you want to update the original post with some alternative syntax suggestions?

HaloFour commented 2 years ago

@jmarolf

I really like the idea of implicit generic type parameters and I think that aligns nicely with my concerns about how closely the source maps to the metadata. I think there is still that mismatch in the code the consumes the implicitly generic type as the generic type argument does need to flow through and there may be confusion there. The inferred generics proposal I believe tossed around the idea of using a placeholder, but I can see how that would quickly become just as messy.

Something else @jnm2 had suggested on Discord was adopting a different convention of mangling the generic type names so that it only reflected the arity of declared generic type parameters, so that IFastCollec<TElement> would have the name IFastCollec`1 despite technically having two generic type parameters. That should prevent the ambiguous cases where one attempts to "overload" the generic type or method based on generic arity as only the explicitly declared generic type parameters would be reflected in the name. I'm curious as to what the team would think about that, or about the possibility of adopting a different name mangling scheme altogether, to help with the coexistence of associated/existential types and normal generic types.

iam3yal commented 2 years ago

I know that the syntax is not as important to the discussion but abstract might be yet another option as opposed to implicit based on what @jmarolf wrote:

So they just become an optional type argument unless the type in non-abstract.

So they are abstract type parameters where they are realized when the type is passed.

Tinister commented 2 years ago

Would/could this support using existential types as type arguments?

interface Iface : IOtherFace<Ext>
{
    type Ext;

    ...
}
Sergio0694 commented 2 years ago

Question: would this also work for return values? I see most (all?) examples only being for parameters πŸ€”

For instance, would something like this work?

interface IAwaitable<protected TAwaiter>
    where TAwaiter : struct, IAwaiter
{
    TAwaiter GetAwaiter();

    interface IAwaiter : INotifyCompletion
    {
        bool IsCompleted { get; }
        void GetResult();
    }
}

And then:

IAwaitable FooAsync()
{
    // Do stuff here and return some awaitable
}

// And later
await FooAsync();

Which in this case would let the returned awaiter be of some type IAwaitable.TAwaiter, and it would allow avoiding allocations for the returned awaiter despite keeping the returned interface abstract (as in, not just returning a specific struct type), and also would enable the JIT to make all calls to the awaiter be constrained and potentially inlineable.

333fred commented 2 years ago

Question: would this also work for return values? I see most (all?) examples only being for parameters thinking

For instance, would something like this work?

interface IAwaitable<protected TAwaiter>
    where TAwaiter : struct, IAwaiter
{
    TAwaiter GetAwaiter();

    interface IAwaiter : INotifyCompletion
    {
        bool IsCompleted { get; }
        void GetResult();
    }
}

Yes, you could do this.

And then:

IAwaitable FooAsync()
{
    // Do stuff here and return some awaitable
}

// And later
await FooAsync();

Which in this case would let the returned awaiter be of some type IAwaitable.TAwaiter, and it would allow avoiding allocations for the returned awaiter despite keeping the returned interface abstract (as in, not just returning a specific struct type), and also would enable the JIT to make all calls to the awaiter be constrained and potentially inlineable.

FooAsync would not compile, as it's directly using IAwaitable. You would need to have a generic type parameter constrained to IAwaitable.

markm77 commented 2 years ago

The great attraction of existentials for me is much improved code clarity and less verbosity when writing generic code. I enjoy C# but, compared to Swift, working with C# generics is very verbose.

Here's an example class (from real code) where TPublicRequest and TPublicResponse are functions of (completely determined by) TEntity but this relationship is obscured by the lack of differentiation between "independent" and "dependent" (existential) type parameters. Also when constructing this class one has to supply all three types (instead of just TEntity) which pushes verbosity and complexity upwards.

    internal class LocalEntityPost<TEntity, TPublicRequest, TPublicResponse> :
        IObjectPost<TPublicRequest, TPublicResponse>
        where TEntity : class, IEntity, ISupportsFluentLocalEntityPost<TPublicRequest, TPublicResponse, TEntity>,
        new()
        where TPublicRequest : Base { }

With existentials I am hoping the above complex class will simplify to something like*:

     internal class LocalEntityPost<TEntity> : IObjectPost
        where TEntity : class, IEntity, ISupportsFluentLocalEntityPost<TEntity>, new()
     { 
           type IObjectPost.TPublicRequest = TEntity.TPublicRequest; // TPublicRequest is now existential type parameter of ISupportsFluentLocalEntityPost<TEntity>
           type IObjectPost.TPublicResponse = TEntity.TPublicResponse; // TPublicResponse is now existential type parameter of iISupportsFluentLocalEntityPost<TEntity>
     }

Using the attractive syntax proposed by @jmarolf , this could become*

     internal class LocalEntityPost<TEntity>: IObjectPost<TEntity.TPublicRequest, TEntity.TPublicResponse>
        where TEntity : class, IEntity, ISupportsFluentLocalEntityPost<TEntity>, new() { }

which is so much more readable and clear. Higher layers of code also now only need specify TEntity.

I believe this feature has the potential to lead to much more concise and clear generic code.

* assuming TEntity doesn't require a cast to ISupportsFluentLocalEntityPost<TEntity> to see its existential type parameters when there are no conflicts. Otherwise please insert appropriate casts.

fabianoliver commented 2 years ago

A very interesting proposal. A few questions came to mind, though; the short version is:

  1. Could this more elegantly be solved through #1992 ?
  2. Should it be possible to implement interfaces multiple times with different existential types?
  3. Is there a way to refer to those types outside of immediate call chains?

In a bit more detail:

Could this more elegantly be solved through #1992 ?

Unfortunately, #1992 is likely a significantly more complex proposal - but potentially one that solves a much, much more general category of issues. I wonder if this proposal here is just a mere special case of it.

If it is, the next question is probably: Are we ever likely to have these, or is that so unlikely that we might as well rather go for the special case described in this proposal (with specific syntax tailored around it).

Anyways, going back to the initial post, I reckon generic wildcards could be applied quite nicely:

public interface IFastEnum<TElement, TEnumerator>
{
  TEnumerator Start { get; }
  bool TryGetNext(ref TEnumerator enumerator, out TElement value);
}

void M<TEnum>(TEnum e) where TEnum : IFastEnum<string, ?>
{
    foreach (var elem in e)
    {
        Console.WriteLine(elem);
    }
}

Should it be possible to implement interfaces multiple times with different existential types?

This might be a touch contrived, but maybe as a starting point - imagine something like this (with a gentle nod to #5413 as well here)

public interface ISerialisable<TSelf, TSerialised> where TSelf : ISerialisable<TSelf, TSerialised>
{
  TSerialised Serialise(TSelf self);
  static TSelf Deserialise(TSerialised serialised);
}

I could well imagine that some type might want to implement this multiple times, say a ISerialisable<TSelf, string> for JSON, ISerilisable<TSelf,byte[]> for protobuf, etc. (again, this is probably not the very best example).

I assume if this should be supported, it must be a generic type parameter? Or in other words, it would not be possible to implement ISerialisable for multiple different existential types if going with a syntax such as below?

// I guess we can't implement this twice?
interface ISerialisable<TSelf> where TSelf : ISerialisable<TSelf>
{
    type Ext;
    Ext Serialise(TSelf self);
    static TSelf Deserialise(Ext serialised);
}

Is there a way to refer to those types outside of immediate call chains?

A real world'ish use case, very loosely based on something I encountered quite a while ago. Quite a prime candidate for such a feature:

// Case 1: Either...
interface IGraph<TVertexId>
{
    TVertexId AddVertex(IVertex vertex);
    bool HasChanged(TVertexId vertexId);
}

// Case 2: ... or ...
interface IGraph
{
    type TVertexId ;
    TVertexId AddVertex(IVertex vertex);
    bool HasChanged(TVertexId vertexId);
}

A graph that allows to add vertices, returns an ID for the vertex, and exposes a bunch of functions to do stuff based on this ID. The ID is implementation-specific; maybe its an index into an array, maybe its some key of an internal dict, maybe its the vertex reference itself, who knows.

Now imagine a classic case where some system would set up the graph on initialisation, store the vertex IDs in some sort of collection/dictionary/etc, and call into the graph throughout the lifecycle of the application, i.e. conceptually:

class Foo {
  private readonly IGraph _graph;
  private readonly ??? _vertex;

  public MyApplication(IGraph graph) {
    _graph = graph;
    _vertex = graph.AddVertex(...);
  }

  public void Tick() {
    if(_graph.HasChanged(_vertex))
     DoStuff();
  }
}

How could we realistically do that?

In case 1 (generic parameter), we could have Foo be generic, or alternative have some sort of generic parameter-less IGraph marker interface, and then do a bunch of reflection with the generic type to invoke HasChanged.

How about case 2? Presumably, we would need to have _vertex be of type object or such, and then have some sort of casting logic that allows to invoke _graph.HasChanged with _vertex cast to _graph.TVertexId

agocke commented 2 years ago

@fabianoliver

Unfortunately, https://github.com/dotnet/csharplang/discussions/1992 is likely a significantly more complex proposal - but potentially one that solves a much, much more general category of issues. I wonder if this proposal here is just a mere special case of it.

I haven't read that proposal, but fortunately, because the system I've described is based on the mathematical models of type theory, I don't need to -- the proposal I linked at the top of the page is the most general possible description of existentials outside of higher-kinded types. This one is more limited because it requires interfaces with existentials to be limited to constraints, but that should completely describe the scope of limitations.

Any more general formulation would have to be either 1) type unsound or 2) use boxing representations, which is slow.

agocke commented 2 years ago

Should it be possible to implement interfaces multiple times with different existential types?

Shouldn't be a problem. As mentioned above my serde-dn library already has interfaces like this and it allows for both the type and the serialization format to be switched around independently.

The one annoyance is that I can't provide built-in implementations for the standard library (like for Guid or string), so I have to provide struct "wrappers" that implement the interface for the user. If proposals around allowing implementations for types you didn't define goes through, that would be quite easy.

fabianoliver commented 2 years ago

@agocke

Shouldn't be a problem. As mentioned above my serde-dn library already has interfaces like this and it allows for both the type and the serialization format to be switched around independently.

Thanks - right, I assume multiple implementation of the "same" interface (with only varying existential type parameters) would work, if those are specified as generic args on the interface declaration (i.e. indeed your ISerializer<abstract TSerializeType, ...> example)

This raises an interesting question though - what would happen in the following scenario?

interface Iface<abstract TExt>
{
    public TExt P { get; }
}

class FaceImplOne : Iface<int> { ... }

class FaceImplTwo : Iface<int>, IFace<string> { .... }

void M<T>(T left, T right)
    where T : Iface
{
    // How could we handle this? If typeof(T) == typeof(FaceImplOne), we're fine.
    // But if typeof(T) == typeof(FaceImplTwo), left.P and right.P don't sufficiently quality the property anymore
    var x = left.P;
    x = right.P;
}

I can't really think of a way out here. Would we require a constraint that if an interface has an existential type parameter, a class can only ever implement this interface once (another implementation with different type parameters would be illegal)?


I haven't read that proposal, but fortunately, because the system I've described is based on the mathematical models of type theory

Apologies, I won't pretend to be particularly familiar with type theory, so maybe my point is moot to begin with. From my Going by your description in #1328,

Existential types are a way of working with a type variable that's set to something but you don't know what it is.

This proposal here, to my understanding, suggests the interface declares which generic arguments are existential types.

I believe the key usage difference to generic wildcards #1992 would be the latter allow the callers themselves to define when a generic argument should be treated as existential or not - it could be situational.

If my interpretation of this is right, I think it would then boil down to: Which approach is preferred? I believe this proposal here leads to slightly shorter syntax, whereas #1992 would offer greater flexibility.

Going back to your motivating example, I think this is how it could look with generic wildcards:

public interface IFastEnum<TElement, TEnumerator>
{
  TEnumerator Start { get; }
  bool TryGetNext(ref TEnumerator enumerator, out TElement value);
}

class List<T> : IFastEnum<TElement, int>
{ .... }

void M<TEnum>(TEnum e) where TEnum : IFastEnum<string, ?>
{
    foreach (var elem in e)
    {
        Console.WriteLine(elem);
    }
}

I.e. I think this would be quite comparable to this proposal - key difference being that you don't get to (or have to) omit the existential type parameters entirely, but rather than you can choose to treat them as existential if the situation calls for it.

The syntax is therefore probably marginally longer, but on the flipside, its entirely up to the caller which generic types it does or does not want to treat as existential, based on the specific use case. E.g.:

bool ContainsValue<TValue>(this IDictionary<?, TValue> dict, TValue value)
{
    foreach(var v in dict.Values)
      if(v.Equals(value))
        return true;
   return false;
}

Furthermore, if a wildcard-based system was advanced enough, it might be able to express very constraints that go beyond pure existence, but still stop short of needing to know the exact generic type. Imagine e.g.

void M(Iface<out T1> left, Iface<out T2> right)
    where T2 : T1
{
   left.P = right.P
}
agocke commented 2 years ago
interface Iface<abstract TExt>
{
    public TExt P { get; }
}

class FaceImplOne : Iface<int> { ... }

class FaceImplTwo : Iface<int>, IFace<string> { .... }

void M<T>(T left, T right)
    where T : Iface
{
    // How could we handle this? If typeof(T) == typeof(FaceImplOne), we're fine.
    // But if typeof(T) == typeof(FaceImplTwo), left.P and right.P don't sufficiently quality the property anymore
    var x = left.P;
    x = right.P;
}

Right, the problem here is that there isn't a rich enough syntax on the caller side to specify what the right implementation is. You don't even have to bring in type equivalence to see the problem:

M<FaceImplTwo>(FaceImplTwo t) // which interface implementation do we use?

I'd suggest this would be an error on the declaration side. If we picked the syntax with type members, this would be reasonably consistent with the way C# already works, because you can't implement the same interface twice for a type (and the type members would not serve to make it a different interface)

FaustVX commented 2 years ago

I don't understand what problems you tried to solve with this proposal. What's the benefits of having

interface Iface
{
    // Existential type
    type Ext;

    public Ext P { get; }
}

class C : Iface
{
    type Iface.Ext = int;

    public int P => 0;
}

instead of existing

interface Iface<Ext>
{
    public Ext P { get; }
}

class C : Iface<int>
{
    public int P => 0;
}

Also, your example is flawed

void M<T>(T left, T right)
    where T : Iface
{
    var x = left.P; // `var` could be type `T.Ext`
    x = right.P; // type checks
}

// If I have these 2 classes

class C1 : Iface
{
    type Iface.Ext = int;

    public int P => 0;
}

class C2 : Iface
{
    type Iface.Ext = string;

    public string P => 0;
}

// Then I call
M<IFace>(new C1(), new C2());
// Inside M<T>, the type checking of `x = right.P` doesn't works.

And if IFace is compiled into IFace<Ext> that mean I can't also write my own IFace<T> because another type already exists, but it's hidden by compiler magic.

fabianoliver commented 2 years ago

@agocke

I'd suggest this would be an error on the declaration side. If we picked the syntax with type members, this would be reasonably consistent with the way C# already works, because you can't implement the same interface twice for a type (and the type members would not serve to make it a different interface)

Yep, I think that would be a reasonable constraint (I'm not a big proponent of C# allowing to implement generic variations of the same interface to begin with, I think it tends to cause more problems than it solves).

Having said that, I think a few challenges would still be there:

  1. Using existentials somewhere in code where the compiler isn't aware of the type information anymore (see also point 3 mentioned here )
  2. Type equivalence (or conditions other than equivalence) on the existential type itself, rather than its containing interface
interface IFoo {
  type T1;
  type T2;

  T1 Property1 {get;set;}
  T2 Property2 {get;set;}
}

class Foo1 : IFoo<string, int> { ... }
class Foo2 : IFoo<string, double> { ... }

class Problems {
    object _prop = new Foo1().Property1; // may seem contrieved, buy imagine eg storing an IDictionary<Type,object> of .Property1 for various IFoos or such

    void Problem1() {
      // No way to make this work (unless casting to dynamic or such)?
      new Foo1().Property1 = _prop;
    }

    void Problem2() {
      // No way to make this work?
      M(new Foo1(), new Foo2())
    }

    void M<T>(T left, T right) where T : IFoo {
        left.Property1 = right.Property1;
    }
}

Maybe those problems could be solved through slight extensions of both the casting, as well as generic constraint syntax? Something like

    void Problem1() {
      new Foo1().Property1 = (Foo1.T1)_prop;
    }

    void Problem2() {
      M(new Foo1(), new Foo2())
    }

    void M(IFoo left, IFoo right)
       where right.T1 : left.T1
    {
        left.Property1 = right.Property1;
    }

(And at risk of sounding like a broken record, I still think that generic wildcards would have the potential to address all of these issues nicely as well ;-) )

jnm2 commented 2 years ago

@FaustVX It would not need to be hidden by compiler magic. See my suggestion in https://github.com/dotnet/csharplang/issues/5556#issuecomment-998366899 for a metadata naming convention that would fully solve this issue.

FaustVX commented 2 years ago

@jnm2 Ok for that, but my other points still remains.

agocke commented 2 years ago

@FaustVX I gave a real-world example here: https://github.com/dotnet/csharplang/issues/5556#issuecomment-997290680

agocke commented 2 years ago

@fabianoliver I read through a bit of that proposal. It's basically existential types, but without the restrictions that make them efficiently implementable. Consider the example

void IterateIfPossible(object obj)
{
    if (obj is IEnumerable<?> items) 
    {
        foreach (var item in items)
            Console.WriteLine(item);
    }
}

This can't be implemented without boxing. It's easy to see if you try to type-check it at the CLR level. Despite the fact that ? is hidden in the syntax, ? must have an actual runtime type, except that there's no uniform type that could possibly represent all types ? could be. As HaloFour mentioned in that issue, int and string do not have a shared representation. You can provide a shared representation by boxing an int, but this is effectively representing everything as a heap-allocated pointer. This is very inefficient and expensive.

In general, if you want efficient value type representations, they must remain unboxed.

agocke commented 2 years ago

Well, addendum, you can do it efficiently, but the only way I know of is a transformation to Skolem normal form as I proposed in https://github.com/dotnet/csharplang/issues/1328, and this proposal was explicitly designed to be much easier to implement and add fewer new typing rules to the language (i.e., path-dependent types)

fabianoliver commented 2 years ago

@agocke

Agreed, I think with wildcard matching, you could do

void IterateOptimisedIfPossible(object obj)
{
    // The most annoying bit about wildcard runtime type matching: obj could implement IFastEnum multiple times
    if (obj is IFastEnum<string, ?> matches && matches.Length == 1)
      Iterate(items[0]);
}

// Compiler could implement this as Iterate<T>(IFastEnum<string,T> enum), i.e. any function that contains generic wildcards would always be implicitly generic
void Iterate(IFastEnum<string, ?> enum)
{
  foreach(var e in enum)
    DoStuff(e);
}

How would IterateOptimisedIfPossible look under #5556 - would it require explicit reflection to Invoke "Iterate" with the actual type of obj as a generic arg?

The fact you could do a more unoptimised iteration using wildcards:

void IterateUnoptimised(object obj)
{
    if (obj is IFastEnum<string, ?> matches && matches.Length == 1)
      foreach(var e in matches[0])
        DoStuff(e);
}

might be a blessing and a curse at the same time. I'd think the downside is that it'd be very easy to miss this isn't optimal. The upside is likely that it gives you optionality; in cases where you don't care about the potential boxing overhead, or want to avoid the potential overhead of an extra generic method, you could opt to use the inline version.

You're probably right that a fully fledged wildcard matching system is probably a lot harder to implement/get right. I wonder if a first implementation could ship without runtime type matching though - i.e. no "is" using wildcards etc. I think the implementation strategy would then quite closely mirror #5556 - everything could be done with hidden generics at compile (or reflection)-time - while potentially leaving the door open to add runtime type features with a consistent syntax further down the road?

agocke commented 2 years ago

@fabianoliver That doesn't work. At the call to Iterate, the compiler needs to provide a generic substitution for T. The only type that could be there is object because you've already lost all static type information.

Sergio0694 commented 1 year ago

Just sharing some random thoughts - if I'm reading this correctly, it seems like the proposed syntax with the associated type being an "interface member" would be preferrable as it'd make things a lot more ergonomic when using source generators? Let me make an example so I can also double check this actually make sense πŸ˜„

In ComputeSharp I have a couple of interfaces (eg. ID2D1PixelShader) that users can implement on a type to indicate that type is a shader type to run on the GPU. The source generator then kicks in and generates a whole bunch of supporting methods for that shader type (eg. to get the transpiled HLSL code, the compiled shader bytecode, other metadata info, marshal the constant buffer data, etc.). These methods are then used by ComputeSharp to actually manage the execution of shaders.

Conceptually, imagine something like this:

interface IFoo
{
    void SomeMethodUsersShouldImplement();

    void SourceGeneratedMethod1();
    void SourceGeneratedMethod2();
    // ...
}

What users do is the following:

partial struct MyType : IFoo
{
    public void SomeMethodUsersShouldImplement() { /* ... */ }
}

The generator then kicks in and generates all those additional methods. With associated types, this could be:

interface IFoo
{
    type TGenerated : IGenerated;

    void SomeMethodUsersShouldImplement();

    interface IGenerated
    {
        static abstract void SourceGeneratedMethod1();
        static abstract void SourceGeneratedMethod2();
        // ...
    }
}

And users would then write:

partial struct MyType : IFoo
{
    public void SomeMethodUsersShouldImplement() { /* ... */ }
}

And the generator would declare a partial declaration of this file, with some file-local type implementing the interface, and would declare that type as the associated type for this interface. Then APIs using this would just use that directly, eg.:

// Before
static void UseType<T>(T type)
    where T : IFoo
{
    type.SourceGeneratedMethod1();
}

// After
static void UseType<T>(T type)
    where T : IFoo
{
    type.TGenerated.SourceGeneratedMethod1();
}

Which seems pretty nice to me. Couple things:

Sergio0694 commented 1 year ago

A few more thoughts on this (and comment above) after discussing the issue of "adding new members" from source generators with @CyrusNajmabadi and @jkoritzinsky the other day. Under this proposal, would existential types support being declared as partial in implementing types? Because if that was the case, the pattern mentioned in https://github.com/dotnet/csharplang/issues/5556#issuecomment-1328036333 could be supported by having the user declare the partial type themselves, and the generator would just implement it, instead of also declaring it from the start (which I understand makes the generator less efficient, in theory). That is, would this be supported?

// Interface declaration shipped in a library
interface IFoo
{
    type TBar : IBar;

    void SomeMethodUsersShouldImplement();

    interface IBar
    {
        void SourceGeneratedMethod();
    }
}

// User code in a project referencing the library
partial class Foo : IFoo
{
    [GeneratedBar]
    partial type TBar;

    public void SomeMethodUsersShouldImplement() { }
}

// Generated code
partial class Foo
{
    partial type TBar = Bar;

    private class Bar
    {
        public void SourceGeneratedMethod() { }
    }
}

This seems like a very powerful and flexible pattern in general πŸ‘€

333fred commented 1 year ago

@Sergio0694 you almost certainly wouldn't be able to use a file type there.

Sergio0694 commented 1 year ago

@333fred updated to use a private type, just to keep things simpler and not risk getting sidetracked πŸ˜„ I didn't consider the fact file private types are very restricted in where they can be used, so yeah that'd be fair.

But in general, ignore the specific accessibility, could also just be public, doesn't matter. The question is just - do you think we could have "partial existential type declarations"?

333fred commented 1 year ago

Probably

abnercpp commented 1 year ago

I know this may be a late opinion, but I'm more inclined towards "use site" existential types, which would allow this functionality to be used for existing interfaces with no change to them whatsoever. Furthermore, the programmer can choose what he makes an explicit type parameter, an implicit type parameter, and an existential type parameter.

On top of that, we wouldn't need a new concept of interfaces that can't be used like regular ones (that is, can only be used as a generic constraint).

public static class LinqExtensionMethods
{
    // `TEnumerable` and `TReturn` are explicit.
    // `TItem` is implicit and inferred by the compiler from the `where` constraints.
    public static <TItem> TReturn Select<TEnumerable, TReturn>(this TEnumerable source, Func<TItem, TReturn> func) where TEnumerable : IEnumerable<TItem>
    {
            // Implementation doesn't matter.
    }
}

And if I don't need to know about TItem, to make it a truly existential type, I can just write something like:

    public static class PrinterExtensionMethods
    {
        // Here `T` from `IEnumerable<T>` is existential.
        public static void PrintAll<TEnumerable>(this TEnumerable source) where TEnumerable : IEnumerable<?>
        {
            // Implementation doesn't matter.
        }
    }

I believe this feature would not require any changes to the CLR, we can just make everything syntax sugar and during lowering the compiler could just make everything an explicit parameter. How these methods show up in reflection, as well as the order of the compiler-generated parameters, however, are another discussion.

agocke commented 1 year ago

@Sergio0694 The problem with partial associated types (they are associated types in this proposal, the original proposal did have something akin to existential types with full Skolemization, and this proposal was a response to that one, but this one doesn't really have existentials) is that ordering is important, so you couldn't have, for instance

partial interface I1
{
   type T1;
}
partial interface I1
{
   type T2;
}

As the ordering of those members is semantically important in the lowering. That's not to say that we couldn't choose the member syntax, but we would have to restrict members to appearing in only one declaration.

@abner-commits Your proposal doesn't really solve one of the main things that this one tries to, which is that writing signatures with tons of generic parameters really sucks.

abnercpp commented 1 year ago

@Sergio0694 The problem with partial associated types (they are associated types in this proposal, the original proposal did have something akin to existential types with full Skolemization, and this proposal was a response to that one, but this one doesn't really have existentials) is that ordering is important, so you couldn't have, for instance

partial interface I1
{
   type T1;
}
partial interface I1
{
   type T2;
}

As the ordering of those members is semantically important in the lowering. That's not to say that we couldn't choose the member syntax, but we would have to restrict members to appearing in only one declaration.

@abner-commits Your proposal doesn't really solve one of the main things that this one tries to, which is that writing signatures with tons of generic parameters really sucks.

Except it does. I fail to see how it would "suck" to discard parameters I don't care about like this:

void DoSomething<T>(T target) where T : IInterface<?, ?, ?, ?>
{
}

Is that any less ergonomic than having to declare an interface just for usage within generics? Plus there's the advantage that this applies to already existing interfaces, and I can add constraints to the implicit types if I want, further enhancing what we can accomplish with genetics.

Here's an example of what would be possible to do with implicit type parameters + existential types:

public <TEnum> void DoSomething<TDict>(TDict target)
    where TDict : IReadOnlyDictionary<TEnum, ?>
    where TEnum : struct, Enum
{

}
Sergio0694 commented 1 year ago

"As the ordering of those members is semantically important in the lowering. That's not to say that we couldn't choose the member syntax, but we would have to restrict members to appearing in only one declaration."

@agocke apologize for the potentially dumb question, but could you elaborate why the ordering would be important here? As in, why would the various associated types for the interface not be handled like other members (eg. fields, methods, etc.) that can be scattered around over multiple partial declarations, and then they're just handled accordingly by name? I'm not sure I'm following why would a type T2 being defined in the same type before/after type T1, or in another partial declaration of its parent interface, result in a different semantics for it πŸ˜…

agocke commented 1 year ago

@Sergio0694 The lowered encoding is as type parameters, so it matters whether T1 or T2 end up seen first, as the encoding would be Iface1<T1, T2> in lowered form. If a change in compilation order of the files resulted in IFace<T2, T1> that would be a binary breaking change.

@abner-commits Writing out discards is worse than not writing out discards. The other problem is that the purpose of the associated type are different.

When you have an associated type that can only be referred to by the implementor, it creates a contract where the type parameter is abstract -- an implementation detail. If you allow them to be manually substituted, then you remove the "hiding" functionality of making it an implementation detail. There's a fundamental question of where the abstraction sits.

Sergio0694 commented 1 year ago

"The lowered encoding is as type parameters, so it matters whether T1 or T2 end up seen first, as the encoding would be Iface1<T1, T2> in lowered form. If a change in compilation order of the files resulted in IFace<T2, T1> that would be a binary breaking change."

Ooh, of course, yeah that makes perfect sense now. Thank you! πŸ˜„

abnercpp commented 1 year ago

@Sergio0694 The lowered encoding is as type parameters, so it matters whether T1 or T2 end up seen first, as the encoding would be Iface1<T1, T2> in lowered form. If a change in compilation order of the files resulted in IFace<T2, T1> that would be a binary breaking change.

@abner-commits Writing out discards is worse than not writing out discards. The other problem is that the purpose of the associated type are different.

When you have an associated type that can only be referred to by the implementor, it creates a contract where the type parameter is abstract -- an implementation detail. If you allow them to be manually substituted, then you remove the "hiding" functionality of making it an implementation detail. There's a fundamental question of where the abstraction sits.

I believe having to write out discards is a minor inconvenience that in return allows you to have fine-grained control over generic parameters. I can have IEnumerable<?> but I can also have IReadOnlyDictionary<T, ?> (if I want to know T or add constraints to it). It feels like a more general-purpose solution than existential types in interfaces, since both can be used for the same thing, however the former can be used for even more things, and further enhances the type system.

Your point about the existential type being an implementation detail I would argue depends on the context. I may want to use the interface in a context where I do care about the existential type, but I still don't want to be tied to a specific implementation of that interface. Likewise, I may want to use that same interface and not care about its generic types. In my opinion, the level of abstraction should be up to the API consumer to decide. If I feel like the generic type doesn't matter for the context I'm in, I can discard it, otherwise I can use it as an implicit parameter, which will even make consuming my API simpler.