Proposal: support covariant return types

gafter commented 9 years ago

Support covariant return types. Specifically, allow an overriding method to have a more derived reference type than the method it overrides. This would apply to methods and properties, and be supported in classes and interfaces. This is one possible alternative to a this type proposed in #311.

This would be useful in the factory pattern. For example, in the Roslyn code base we would have

class Compilation ...
{
    virtual Compilation WithOptions(Options options)...
}

class CSharpCompilation : Compilation
{
    override CSharpCompilation WithOptions(Options options)...
}

The implementation of this would be for the compiler to emit the overriding method as a "new" virtual method that hides the base class method, along with a bridge method that implements the base class method with a call to the derived class method.

HaloFour commented 9 years ago

How would that work if a consumer called the virtual method on the base class given that the derived class can't both override and shadow the method?

public class Foo
{
    public virtual object Baz()
    {
        return "fizz";
    }
}

public class Bar : Foo
{
    public override string Baz()
    {
        return string.Concat(base.Baz(), "buzz");
    }
}

What would Bar look like in this case? What would happen if you called Foo.Baz?

MgSam commented 9 years ago

Yes- please please do this feature. As I'm sure you're all aware, this feature is far and away one of the most commonly asked about issues on StackOverflow and elsewhere. People expect this to "just work" and when it doesn't it throws a big wrench into the whole design of a factory pattern.

Would be interested to see the details of the compiler implementation fleshed out a little further.

gafter commented 9 years ago

@HaloFour although you cannot hide and override in source, the compiler can arrange to do that in IL.

HaloFour commented 9 years ago

@gafter Okay, great. That makes perfect sense.

sharwell commented 9 years ago

This could work. In this example, the override returns the same type as the base method. The C# compiler generates a special attribute indicating the covariant return type. The C# compiler would use this attribute for semantic analysis (such as a method which overrides Bar.Baz), and then silently insert the necessary cast at call sites to methods with this attribute.

Callers (in other assemblies) using earlier versions of C# have no problem consuming these APIs; they just have to insert the casts themselves.

Source (C# "vNextNext"):

public class Foo
{
    public virtual object Baz()
    {
        return "fizz";
    }
}

public class Bar : Foo
{
    public override string Baz()
    {
        return string.Concat(base.Baz(), "buzz");
    }
}

public class Program
{
    public static void Main(string[] args)
    {
        Bar bar = new Bar();
        string result = bar.Baz();
        Console.WriteLine(result);
    }
}

Result (C# "vNow"):

public class Foo
{
    public virtual object Baz()
    {
        return "fizz";
    }
}

public class Bar : Foo
{
    [return: System.Runtime.CompilerServices.ReturnType(typeof(string))]
    public override object Baz()
    {
        return string.Concat(base.Baz(), "buzz");
    }
}

public class Program
{
    public static void Main(string[] args)
    {
        Bar bar = new Bar();
        string result = (string)bar.Baz();
        Console.WriteLine(result);
    }
}

HaloFour commented 9 years ago

@sharwell Why the need for an attribute? That would require an addition to the CLR for something that I imagine won't be implemented for C# 6.0 "vNow"/"vNext" anyway.

Why not just solve it "more correctly" in C# 7.0 "vNextNext" by having the compiler emit both the overriding method and the shadowing method with the same name and parameters? Then you don't have a type-erasure problem and any existing compiler would already support binding to the correct overload. That would allow the language support for this feature to be supported on literally any framework version and even by older versions of the compilers.

Bar would be emitted as:

public class Bar : Foo
{
    public override object Baz()
    {
        return this.Baz();
    }

    public new string Baz() // not legal C#, perfectly legal in IL
    {
        return string.Concat(base.Baz(), "buzz");
    }
}

The compiler would then bind calls to Bar.Baz to the shadowed method and calls to Foo.Baz to the virtual method, which is the current behavior of the compiler and eliminates the need for any casting.

sharwell commented 9 years ago

@HaloFour A custom attribute would allow the compilers for multiple languages that compile to IL to provide and consume this feature to any other language capable of expressing this concept. A prime example is the ParamArrayAttribute. The proposal you describe has additional problems:

The shadow method would introduce a new frame in the stack trace that would be avoided in my proposal. The method would also need to pass through the parameters, which doubles the expense of this part of the call.
What would the VB compiler do if the user tries to override this method (since there are now two methods that differ only by return type)? What would older versions of the C# compiler do?
The method with a covariant return type is marked as override, but the MethodInfo for the compiled method indicates it is a new slot. The MethodInfo.GetBaseDefinition would not provide the expected result.

HaloFour commented 9 years ago

It would be tiny and very easily inlined. This also only applies to virtual calls through the base type.
The VB and C# compilers already support this, they had to for normal method shadowing to work. I did test and confirm in C# 5.0 using an assembly written in IL.
Which is already the case with shadows. Using an attribute would cause the same problem as the return type would not be what was expected.

Furthermore, using shadowing would allow a covariant return to be a value type when the base return type is object and direct callers would avoid that box. With type erasure that is unavoidable.

Lastly, this compiler candy would work regardless of which framework version is being targeted given there is no dependency on new classes. Projects could target 2.0 (or even 1.0) or CoreCLR or Mono. On Feb 11, 2015 12:26 PM, "Sam Harwell" notifications@github.com wrote:

@HaloFour https://github.com/HaloFour A custom attribute would allow the compilers for multiple languages that compile to IL to provide and consume this feature to any other language capable of expressing this concept. A prime example is the ParamArrayAttribute https://msdn.microsoft.com/en-us/library/system.paramarrayattribute.aspx. The proposal you describe has additional problems:

The shadow method would introduce a new frame in the stack trace that would be avoided in my proposal. The method would also need to pass through the parameters, which doubles the expense of this part of the call.

What would the VB compiler do if the user tries to override this method (since there are now two methods that differ only by return type)? What would older versions of the C# compiler do?

The method with a covariant return type is marked as override, but the MethodInfo for the compiled method indicates it is a new slot. The MethodInfo.GetBaseDefinition https://msdn.microsoft.com/en-us/library/system.reflection.methodinfo.getbasedefinition.aspx would not provide the expected result.

— Reply to this email directly or view it on GitHub https://github.com/dotnet/roslyn/issues/357#issuecomment-73923587.

sharwell commented 9 years ago

@HaloFour So we are clear, your points are presented well even if I do not draw the same conclusion at this time. I appreciate that you are challenging me really think about the position I've taken and whether or not it would work in the long run.

Several features of C# and other .NET languages are supported by custom attributes, and this list grows over time. Another prime example is ExtensionAttribute, which is used to define extension methods. Users working with older versions of .NET can still use the extension method functionality by providing their own definition of ExtensionAttribute in the absence of a framework-provided attribute.

After working through the following example, I believe the concerns 1 and 2 would be handled equally well with your proposal. Consider three classes:

Class A is the base class, and defines a virtual method Foo that returns object.
Class B extends A, and defines an override of Foo that returns string. In your example, this would result in a new virtual method Foo that returns string, as well as a compiler-generated "anonymous" shadow method that matches the signature of A.Foo, and simply delegates the call to B.Foo.
- The call overhead as well as the stack frame issue can be resolved by using the tail. prefix for the call instruction, at least in all cases where boxing is not required.
- The compiler-generated shadow method can be marked sealed to ensure types derived from B honor the covariant return type contract even when consumed from a language that does not "understand" covariant return types.
Class C extends B. If C wants to override method Foo from a base type, it must override the virtual method B.Foo which returns string.

Do you have an example where the .NET framework uses shadowing when overriding a method defined in a base type?

HaloFour commented 9 years ago

@sharwell Thanks, although I'm really just arguing the point for @gafter since this is the implementation as he described it.

I have no examples of where .NET does this currently. I'm not aware of any language other than IL that permits this to be expressed.

I do think that it's proper if class C wants to override Foo that the return value would have to be string (or another covariant return type in other cases).

Playing with it a little more I am running into issues with ambiguity. C# can handle a single level of inheritance just fine. B can override A and provide an overriding version of Foo that returns object as well as a shadowing version returning IEnumerable<int> and C# overload resolution handles that just fine when calling B.Foo. However, if you add in C which does the same and defines a new Foo returning IList<int> then the C# compiler will fail when attempting to call C.Foo.

VB.NET is even worse in that it cannot resolve B.Foo.

If those issues cannot be resolved in a manner that would be backward compatible that would effectively limit the functionality to projects using the post-Roslyn compilers where presumably the overload resolution would be modified to handle these situations.

I'm going to play with this a little more. I'd like to hear from @gafter regarding this problem.

Update: It seems that the Common Language Runtime spec does permit an override to have a different name and potentially different visibility. I'm going to see if I can take my sample assembly, mess with it to explicitly override using private/renamed methods and see if the compilers can consume them.

Update 2: Through explicit use of the .override clause in IL I was able to resolve all of the overload resolution issues. I was able to override using a method with a different visibility and name.

HaloFour commented 9 years ago

@sharwell To ping the thread, using explicit overrides to hide the base member through changing name/visibility fixes the overload resolution problems I was experiencing with C# and VB.NET. You effectively end up with the following:

public class Foo {
    public virtual object M() {
        return "fizz";
    }
}

public class Bar : Foo {
    // sorta equivalent to explicit implementation except for overriding, which is legal in IL
    private sealed override object Foo.M() {
        return this.M();
    }

    public new virtual string M() {
        return string.Concat(base.M(), "buzz");
    }
}

Compilers looking at Bar only see one M method that returns string.

sharwell commented 9 years ago

Any issue with using tail.call instead of just call in Bar."Foo.M"?

Certain other downsides remain:

Calling ((Foo)obj).M() will result in one additional call for each method in the inheritance chain that introduces a covariant return type.
There is no way using reflection to identify that string Bar.M() overrides object Foo.M().

sharwell commented 9 years ago

Summary so far

Shadow method

Advantages:

Seamless compatibility with languages that do not support covariant return types, including enforcement of the rule that a derived type cannot widen the set of types returned by a base class' definition of a method
Compiler-only feature (no dependency on changes in the BCL)
Ability to call methods which return a covariant value type without boxing an intermediate result

Disadvantages:

Additional call overhead when calling the base method
Might be difficult to use reflection to determine if a method overrides a method from a base type

Attribute applied to return type

Advantages:

Compatible with languages that do not support covariant return types
Call overhead limited to a dynamic downcast (and potentially unboxing a value type), regardless of the number of covariant restrictions exist in the inheritance hierarchy
Less reliance on the JIT to inline calls for efficiency

Disadvantages:

Enforcement that a derived type not widen the set of types returned by a base definition of a method requires the compiler understand a new attribute
Requires a new attribute to be defined in the BCL
Covariant methods which return a value type produce an intermediate value that needs to be unboxed

HaloFour commented 9 years ago

@sharwell Looks like tail. would work just fine.

1.. If the compiler emitted a third private non-virtual method which contained the actual method logic that would probably be a little more efficient, especially if the compiler emitted a tail. call instead of callvirt to avoid the null check:

public class Bar : Foo {
    // sorta equivalent to explicit implementation except for overriding, which is legal in IL
    private sealed override object Foo.M() {
        return M_helper();
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    private string M_helper() {
        return string.Concat(base.M(), "buzz");
    }

    public new virtual string M() {
        return M_helper();
    }
}

Update: Realized that this doesn't make sense. If the overriding method called some non-virtual private method then calling the virtual method on the base class wouldn't result in a call to an overriding method of the virtual shadow method.

2.. This is true, and I don't know that there is anything that can be done to directly address that.

I will be honest that part of the reason that I don't like the attribute direction is that I've been doing a lot of Java lately and the concept of type-erasure just makes my skin crawl. :smile:

HaloFour commented 9 years ago

@sharwell Well you're definitely right about the base method overhead. Put together a test going three levels of overrides deep using the explicit overrides and shadowing strategy as well as standard overrides.

I have three classes, cleverly named C1, C2 and C3, where C3 inherits from C2 and so on. C1 defines four methods, M1 through M4, each of which has the return type of object. Method M1 and method M3 return string and method M2 and method M4 return int. In class C2 the method M1 has a return type of IConvertible and the method M2 has a return type of IEquatable<int>. In class C3 the method M1 has a return type of string and the method M2 has a return type of int. Each of the overrides simply returns a string or an int and makes no attempt to call the base method or transform the value.

In the test I create an instance of C3 and assign that to variables of each C1, C2 and C3. I then call each of the four methods in a hard loop.

Method	Return Type	Time
`C1::M1()`	`object`	05.5152478
`C1::M2()`	`object`	03.4010111
`C1::M3()`	`object`	00.3548070
`C1::M4()`	`object`	00.8707667
`C2::M1()`	`IConvertible`	03.0672045
`C2::M2()`	`IEquatable<int>`	00.8000051
`C2::M3()`	`object`	01.0113014
`C2::M4()`	`object`	01.9512956
`C3::M1()`	`string`	00.3536757
`C3::M2()`	`int`	00.3025481
`C3::M3()`	`object`	00.4555915
`C3::M4()`	`object`	00.8899974

Here is the IL for these methods: Covariant Returns IL

And here is the C# for the test project: Covariant Returns Test Project

I don't doubt that this test could be done a little better and perhaps there are better strategies for supporting "true" covariant returns from the compiler. Despite the overhead I still prefer a method which results in a method signature that includes the covariant type.

gafter commented 9 years ago

When implementing multiple levels of overriding of covariant-returning functions, it is no necessary for the compiler to generate a long chain of method invocations. The compiler can implement this with at most a single intermediate method at runtime. That is the way the Java compiler does it.

jnm2 commented 9 years ago

If you're planning on shadowing, one extremely important gotcha to add to the list: Shadowing methods can't inherit method attributes from the base method. I suppose the workaround would be for the compiler to copy all the attributes from the base method to the shadowing method. If this isn't dealt with, if you change the return value on your C# override method you will unknowingly erase all the method's attributes, including any parameter or return value attributes.

IMO shadowing is dirtier from a conceptual standpoint and a reflection standpoint.

gafter commented 9 years ago

@jnm2 Yes, you should expect the compiler to produce code that has the appropriate semantics.

dgrunwald commented 9 years ago

What about using the function to construct a delegate? new Func<string>(bar.M)

This would work fine with the shadowing solution, but how would the reflection solution handle this case?

kbirger commented 9 years ago

Wow. Fantastic posts from @sharwell and @HaloFour.

If you guys have any of your working samples around still, I'd love to see and try to understand a little bit better.

In general, I see more value in the attribute approach as it would scale better with multiple levels if the inlining doesn't work out as guaranteed, but one thing that I wonder about is the casting issue. Would it work with a direct cast, or would it have to be a full conversion cast?

andersborum commented 9 years ago

Definitely second this feature - it's a feature we've been asking for years. Although one should favor composition over inheritance, this is a feature that one expects to be available in a modern version of the language.

And no, the current workaround with shadowing the inherited member is not something the team should be proud of pushing to the developers.

HaloFour commented 9 years ago

@andersborum

The shadowing solution is only to decrease what needs to be done to make this happen. Supporting it "properly" requires changing the CLR since the CLR considers the return type to be a part of the signature of the method and permits overloading based on different return types. Modifying those semantics could be considered a breaking change. Even if that's not a concern any language changes that depend on runtime changes could be considered immediate points against it.

To note, I also would prefer it to be done correctly. I'm not a fan of compiler candy that cannot be enforced by the runtime. Java can keep it's erasure garbage.

andersborum commented 9 years ago

@HaloFour

All valid points. However, in this case I am all for making a breaking change to the CLR, favoring correctness over compiler candy. I'm not sure about the impact of making a breaking change of this kind, but it feels like a huge oversight that they released the variance story without this feature.

I remember speaking with Mads Torgersen and Eric Lippert at PDC 2008 in the evening during the "ask the experts" session, sitting at a table and proposing various ways to implement this feature (among others a kind of "safe" covariant return type design). Especially Eric tried to convince that unless they could justify "designing, implementing, documenting and supporting" the feature, it was a no go.

Given the amount of questions on stack overflow on this specific (missing) feature, I think Eric (and the rest of the team) should seriously reconsider what language features provides value to developers. I'd personally trade most of what's in C# 6.0 for covariant return types, and I speak on behalf of many developers.

It was a huge let-down in my oppinion, and I hope to see the team take community feedback seriously.

HaloFour commented 9 years ago

@andersborum I'd have to agree with Eric. Any and every feature needs to demonstrate its value beyond the cost to implement. I believe the other comment was (paraphrasing) "every feature needs 100 points to get implemented, and every feature starts with -100 points and has to make up the difference."

As for listening to feedback, most of the active conversations here seem to mirror the highly voted issues on UserVoice. Covariant return types does not have a lot of votes (29 at the moment, I just added mine).

Anytime I find myself looking at the C# 6.0 list and thinking that it looks a little skimpy I have to remind myself of the massive amount of effort that went into rewriting the compilers.

gafter commented 9 years ago

@andersborum The question is not whether it would be better to have CLR support for covariant returns or not - it would definitely be better to have CLR support. The question is whether we would rather have this feature implemented in the compiler (without CLR support), or not have the feature at all. Those are the two most realistic options for C# 7. Getting CLR spec and implementation changes (in all widely-used CLR implementations) is much harder than getting C# compiler changes.

Daniel-Svensson commented 9 years ago

+1 for this feature

With the good support for covariance/contravariance of generics I tend to forget that it is not available for normal methods and have at a number of occasions spend some frustrated time searching for how to get it to work just to realise that it don't

While CLR support seems to be required for a correct implementation, it makes sense to investigate what can be done when compiling for the current clr version(s).

Attribute inheritance should be added to sharwell s nice comparison between shadowing and attributes. Just as @jmm2 pointed out this can partly be solved by the compiler. But that would introduce another source of breaking changes since an update in library A will affect overloads differently dependent on if they use covarance or not which would be far from obvious.

Shadowing

Don't work as expected with attribute inheritance support, not even with the workaround proposed by @jnm2 m2
- Without any kind of CLR support changing attributes on the base class will affect derived classes differently depending on if covarance is used or not.
- This means that changing attributes in any way in nonsealed classes must be treated as breaking changes.

Attribute

Works as expected with attribute inheritance
Without CRL support additional completely unnecessary casts will be required at each invocation (PERF)

Daniel-Svensson commented 9 years ago

Update: When wrinting this I had the impression that this proposal included out parameters when refering to return value. I am not sure if that actually was the case or not. I have made som slight modifications, but left most as is.

After some more though on this.

When implementing/considering covarance please don't forget/discard contravariance for input parameters completely there are definitly cases for variance in both directions. Even if covariance and (and specifically covariance of return type only) would cover the majority of the cases where i have missed it.
One problem which might arise with introducing covariance if we support return values from out parameters is that it might not be clear which method one tries to overload, such as where the base class have two different virtual methods which differ only by an out parameter type, say IA and IB.

If a derived class then contains an override returning class C : IA , IB then it must be obvious which one of them to override or both. One approach would be to allow/make co/contravariant return to specify the exact signature of the overriden version. Ex:
```
// These could as well be classes where ex IB derives from IA
interface IA {}, inteface IB{}
class C : IA , IB*

class Base
{
    virtual bool TryGet(out IA);
    virtual bool TryGet(out IB);
}
class Derived : Base
{
  // overrides both by default?,
  // but should it not be possible to specify a specific one just as with interfaces?
  override bool TryGet(out C);
}
```
Alternative A: Allow a way to specify complete signature of the method to override (after the new signature, or maybe in the metadata). This way it could be optional to specify which method was overriden
```
class CSharpCompilation : Compilation
{
   override CSharpCompilation WithOptions(Options options)
      KEYWORD? Compilation WithOptions(Options options)
  {
    .....
  }
}
```
Alternative B: Original signature before the new signature
```
 class CSharpCompilation : Compilation
 {
     override Compilation WithOptions(Options options)
         with/as CSharpCompilation WithOptions(Options options)
    {
      .....
    }
 }
```

svick commented 9 years ago

@Daniel-Svensson

where the base class have two different virtual methods which differ only by the return type

While such type would be a valid .Net class, I don't know about any language that would let you write it (apart from IL) and I don't think you can expect a language like C# to handle that situation, return type covariance or not.

HaloFour commented 9 years ago

@Daniel-Svensson

Contravariance would only make sense in the case of in parameters. Since you can already overload based on parameter type you can accomplish this right now. Covariant returns are only necessary as a language feature since C# doesn't permit overloading by return type, which is something that CIL permits.
This isn't a problem since it's not possible for a C# class to implement two methods of the same signature from two separate interfaces separately without using explicit implementation. As such, you either have to have a single covariant method which implements both methods, or you need to explicitly implement and manually handle one of the implemented methods differently. Covariance aside this isn't any different than the situation today.

Daniel-Svensson commented 9 years ago

Thanks @svick for pointing that out.

I originally had a more complex example with out parameter but simplified it by switching to the original example. I think you are correct in that if we limit covariance to just methods return types (and maybe property getterns) and dissallow it for return values via out parameters then this seems to be a non-issue so it would not be neccessery to introduce a way to explicitly say that only a specify method is overrided.

@HaloFour

This isn't a problem since it's not possible for a C# class to implement two methods of the same
signature from two separate interfaces separately without using explicit implementation. 
As such, you either have to have a single covariant method which implements both methods, or 
you need to explicitly implement and manually handle one of the implemented methods differently.

Yes, to provide a way to explicitly say which method to override just as you can with interface methods was what I intended to say, I will update my wording.

gafter commented 9 years ago

@ldematte asked

I finally had some time to review them (you guys write a lot, awesome!) and I like #357 very much! It is quite well defined, specified and contained. I suppose it will be implemented with the hide(new)/override + bridge, plus attribute copy, plus the "at most a single intermediate method at runtime" optimization (not with the custom attribute). How should I start? From the spec and tests, maybe? (sorry, very first time contributing here...)

Start by writing up a description of what you would want the compiler to do, with some examples (source and generated code), and write some unit tests that will verify that the compiler does that. Include some "negative" tests for situations that should not work. Also write up a more detailed description of the the language spec... for example, precisely what kinds of changed return type would be allowed and not allowed? You will probably want to refer to some subset of the conversions. I expect, for example, that user-defined conversions will not be involved. Also document how the compiler will identify which methods need to be overridden, and which bridge methods need to be generated, especially in multi-project examples where source may not be available for the methods being overridden. How will the presence of the bridge methods in the member list affect overload resolution and method lookup?

Once we make sure we're on the same page about that, write up the proposed implementation strategy. It is fine if you're prototyping to help drive your understanding of this and the stuff for the previous paragraph, but at this stage we need to make sure we're on the same page about an appropriate implementation strategy. You should also consider how this will interact with type modifiers that appear in metadata. Currently when you override or implement a method that has type modifiers in it, the compiler modifies your method signature to add those modifiers. Will that interact with covariant returns in the implementation?

You should do as much of your development "in the open" as possible, so that we can give you feedback before you go too far with a strategy; you don't want to find out when you're "done" that we prefer you do things a completely different way. I think the best way to do that is to check in your working state to a github branch frequently.

You're welcome to contact me by private email using myfirstname.mylastname@microsoft.com (substitute my actual names in there) if there is anything I can do to help get you organized.

ldematte commented 8 years ago

Ok, I have done some investigation and I think I have a good grasp on how to proceed. I have taken as an example the (already existing) synthesized explicit interface implementation (class SynthesizedExplicitImplementationForwardingMethod), looked at how it works and where it is used in the pipeline, and I think I should use the same pattern, if not re-using (generalizing it a bit) the same code.

Regarding all the write-up (what the compiler should do, examples, more detailed spec...) where and how should I write it? Do you have an example to point me at? Should I write it here, or there is a place where this documentation should go into (or should I just start writing emails to you?)

gafter commented 8 years ago

That sounds like a promising approach.

This is as good a place as any to put docs for now. Please use whatever form is most natural to you. We just want to make sure we're on the same page.

ldematte commented 8 years ago

Elaborating on what @HaloFour and @gafter commented about avoiding chains of method calls: I have tried to get @HaloFour IL example and add another method to C3.

Basically, you now have:

class C1 {
    public virtual object M1() { ... }
}
class C2: C1 {
   // this is actually "M1_hidden overrides M1"
   private override sealed object C1.M1() { return this.M1(); }
   public new virtual IEnumerable<int> M1() { ... }
}
 class C3: C2 {
   // this is actually "M1_hidden_2 overrides M1"
   private override sealed object C1.M1() { return this.M1(); }
   // this is actually "M1_hidden overrides M1"
   private override sealed  IEnumerable<int> C2.M1() { return this.M1(); }
   public new virtual IList<int> M1() { ... }
}

In IL:

 .method private hidebysig virtual final 
      instance object  M1_hidden_2() cil managed
 {
 .override N.C1::M1
 // Code size       10 (0xa)
 .maxstack  1
 .locals init (object V_0)
 IL_0000:  nop
 IL_0001:  ldarg.0
 IL_0002:  tail.
 IL_0004:  callvirt   instance string N.C3::M1()
 IL_0009:  ret
 } // end of method C3::M1_hidden_2

It assembles, and csc sees it correctly, even VS sees it correctly as M1() with three overrides (different only on the return type). It also runs with no problem, and the method is called correcly, cutting through the hierarchy as expected.

If you have C3 x = new C3(); instead of going through the whole chain: ((C1)x).M1() -> obj C2.M1() -> IEnum<> C3.M1() -> IList<> C3.M1() it calls ((C1)x).M1() -> obj C3.M1() -> IList<> C3.M1()

But I have some concerns. The first: mix of override and final. It works, but I have vented my concerns here. Is it really allowed, or I am just being lucky?

Second: I am introducing an extra method for each combination of covariant return type. This could bring to some multiplication of methods; is this acceptable?

gafter commented 8 years ago

@ldematte This is exactly the implementation technique I has hoping you would use. The combination of sealed and override is supposed to work exactly the way you are using it. Yes, it is definitely acceptable to introduce all those "extra" methods. You have one declared method per vtable slot (counting vtable slots in each class separately), so the proliferation of method (bodies) is no different than the proliferation of vtable slots. When two method bodies are identical they are shared in the generated IL, so there really isn't as much overhead as it would appear.

Happy Thanksgiving!

gafter commented 8 years ago

/cc @dotnet/roslyn-compiler for those of you who want to follow this conversation.

ldematte commented 8 years ago

Happy Thanksgiving!

ldematte commented 8 years ago

So, I finally have some time to think about it, do some research and as proposed by @gafter, write some sort of spec, where I expect to modify the compiler and what I expect it to do.

Support for covariant return types in derived classes.

Note: I am referring to version 5.0 of the C# specification, as it is the last one publicly available on the Microsoft website.

The proposal is to relax the constraint defined in 10.6.3 (virtual methods) and 10.6.4 (override methods), where an override method can override an inherited virtual method with the same signature. The "same signature" constraint" is relaxed using a definition similar to 15.2 (delegate compatibility) for the return type:

An identity or implicit reference conversion exists from the return type of M to the return type of D.

An implicit reference conversion covers all the inheritance-related conversions. It seems OK to me, even if it may be advisable to further restrict this rule by explicitly listing the conversions we want to allow and support. Suggestions are welcome. This proposal covers only overridden inherited virtual methods, not "new" or overridden abstract methods.

Examples:

class A
{
    public virtual object F() { return null; }
    public virtual object G() { return null; }
}
class B: A
{
    new private object F() { return "Foo"; }            // Ok, hides A.F within body of B
    new private string G() { return "Bar"; }            // Error, new continues to follow exact signature match rules
    new public virtual string G() { return "Bar"; }     // Error, new continues to follow exact signature match rules
    new public virtual object G() { return "Bar"; }     // Ok, hides A.G
}
class C: B
{
    public override string F() { return "Foo"; }    // Ok, overrides A.F
    public override string G() { return base.G(); } // Ok, overrides B.G
}

method F in B hides the virtual F method inherited from A. Hiding continues to follow the previous rule (exact signature match)

An abstract method declaration is permitted to override a virtual method. In the example

using System;
class A
{
    public virtual object F() {
        Console.WriteLine("A.F");
        return null;
    }

    public virtual object G() {
        Console.WriteLine("A.G");
        return null;
    }
}
abstract class B: A
{
    public abstract override object F(); // Ok
    public abstract override string G(); // Error: no covariant return here
}
class C: B
{
    public override void F() {      // Ok
        Console.WriteLine("C.F");
    }

    public override string G() {    // Ok, coavariant signature applies
        Console.WriteLine("C.F");
        return "C.F";
    }
}

class A declares a virtual method, class B overrides this method with an abstract method, and class C overrides the abstract method to provide its own implementation. B.G still needs to follow the previous rule (exact signature match), but C.G is an override of a virtual method, and we can therefore apply the new rule, and allow for a covariant return type.

This feature needs not to break existing code, or existing languages (including older versions of C#). The idea is to make the compiler emit code for two methods: one matching the exact signature of the inherited virtual method , and another matching the declared covariant method signature. Let's call the three methods A_M (for the base class, inherited method, aka base method), B_M' (for the hidden override method with the exact signature, aka shadow method) and B_M (for the override method with a covariant return type, aka shadowing method or covariant (override) method):

B_M' will be sealed, private, and it will override A_M explicitly, using a different name. Inheriting from a method with a different name and change its accessibility is not possible in C#, but it definitely is in IL.
B_M will be a new method. It will hide A_M in B and in any class derived from B, acting as a new "start point" for inheritance, so it will be also virtual

B_M' will just be a stub, and it will call and return B_M

Example:

public class C1
{
    public virtual object M1()
    {
        return "Fizz";
    }
}

public class C2 : C1
{
    // sort of equivalent to explicit implementation except for overriding, which is legal in IL:
    // .method private hidebysig virtual final instance object  M1_hidden() cil managed
    // {
    //   .override N.C1::M1
    private sealed override object C1.M1() {
    {
        return this.M1();
    }

    // hide/shadow, new "start point" for inheritance
    // not legal C#, perfectly legal in IL
    // .method public hidebysig newslot virtual instance class [mscorlib]System.String M1() cil managed
    public new string M1() 
    {
        return string.Concat(base.M1(), "Buzz");
    }
}

The compiler will identify if an override is valid using the above, modified rules, and it will be modified as follows:

during check, relax the return type check (no ERR_CantChangeReturnTypeOnOverride for implicit conversions)
the behaviour of CSharpOverrideComparer, CheckOverrideMember in SourceMemberContainerTypeSymbol will be modified accordingly
- if the method matches exactly, compilation proceeds as usual.
- if the method uses a different, covariant return type, mark it
check/modify OverriddenMethod and GetLeastOverriddenMethod in MethodSymbol
During compilation of an override method (MethodCompiler.CompileNamedType, where MethodSymbol.IsOverride) find if there is a covariant override in the anywhere in the inheritance chain:
- if the current override is a covariant override, emit both a "new virtual override" (aka covariant override) method and a "private sealed override" (aka shadow). For example:
  - .method private hidebysig virtual final instance class [mscorlib]System.IComparable M1() M1_hidden() cil managed .override N.C2::M1
  - .method public hidebysig newslot virtual instance class [mscorlib]System.String M1() cil managed
- The compiler-generated shadow method will be marked sealed to ensure types derived from the class which introduces the covariant override honour the covariant return type contract even when consumed from a language that does not "understand" covariant return types.
- The "new virtual override" (aka shadowing method) won't inherit automatically method attributes from the base method. The compiler will copy all the attributes from the base method to the shadowing method.
foreach covariant override in the inheritance chain, (also) emit a stub/bridge with the original (non-covariant) return type: .method private hidebysig virtual final instance object M1_hidden_2() cil managed .override N.C1::M1
The stub will call the covariant method (this.M()). The call will use the tail. prefix for the call instruction, at least in all cases where boxing is not required, to reduce the call overhead.

The optimization in 5. is to prevent the aforementioned problem of one additional call for each method in the inheritance chain that introduces a covariant return type. For example, suppose you have:

class A {
    object M() { return 0; }    
}

class B: A {

    //.method private hidebysig virtual final instance object  A_M() cil managed
    //.override A::M        
    private sealed override object A.M() { return this.M(); }

    //.method public hidebysig newslot virtual instance class [mscorlib]System.IEnumerable`1<int32> M() cil managed
    public virtual new IEnumerable<int> M() {
        yield return base.M(); }
    }
}

class C: B {
    private sealed virtual override IEnumerable<int> M() B.M() { return this.M(); }

    public virtual new IList<int> M() {
        yield return base.M(); }
    }
}

This code A x = new C(); ((A)x).M(); will produce the following chain of method calls:

A.M() -(via virtual dispatch)-> object B.M() -(bridge, will call)->
IEnumerable<> B.M() -(via virtual dispatch)-> IEnumerable<> C.M() -(bridge, will call)->
IList<> C.M()

to avoid this chain of call the compiler will introduce, for every inherited virtual method overridden at some point with a bridge, i.e. with a covariant return type (the shadowing method), a bridge to the original, virtual, base (non-covariant) method.

This means that the compiler, while processing each override method T M() in class C, will:

for each class B in the inheritance chain, see if M is a covariant override.
If such an override is found (i.e. T M() is a shadowing method in B) find the corresponding shadow override T' B.M()
through a traditional mechanism, identify the base class/base method for override shadow T' B.M(). Let it be override of T'A.M() in class base A
the compiler will inject in C an additional bridge (shadow override method), which will override T' A.M(), as: //.method private hidebysig virtual final instance object C_A_M() cil managed //.override A::M private sealed override object A.M() { return this. M(); }
every time such a method is found, repeat this process for class A, until a base method with no covariant override is found (virtual, not new, no shadow method for the same signature M())

This way, the bridge will allow a virtual dispatch directly to C:

A x = new C(); ((A)x).M() will produce the following, reduced, call chain:

A.M() -(via virtual dispatch)-> object C.M() -(bridge, will call)->
IList<> C.M()

The original brigde will continue to exist; a call to ((B)x).M(); will produce the following call chain:

B.M() -(via virtual dispatch)-> IEnumerable<> C.M() -(bridge, will call)->
IList<> C.M()

This way, when implementing multiple levels of overriding of covariant-returning functions, it is no necessary for the compiler to generate a long chain of method invocations. The compiler can implement this with at most a single intermediate method at runtime, which will be likely inlined (it is a single call instruction). That is the way the Java compiler does it.

Notice that this is not strictly necessary for cases in which the override in the current class (C) is not covariant:

class A {
    object M() { return 0; }    
}

class B: A {

    //.method private hidebysig virtual final instance object  A_M() cil managed
    //.override A::M        
    private sealed override object A.M() { return this.M(); }

    //.method public hidebysig newslot virtual instance class [mscorlib]System.IEnumerable`1<int32> M() cil managed
    public virtual new IEnumerable<int> M() {
        yield return base.M(); }
    }
}

class C: B {
    public virtual IEnumerable<int> M() {
        yield return 1; }
    }
}

In this case, A x = new C(); ((A)x).M() will produce the following invocation chain:

A.M() -(via virtual dispatch)-> object B.M() -(bridge, will call)->
IEnumerable<> B.M() -(via virtual dispatch)-> IEnumerable<> C.M()

Still a single invocation at runtime, but this invocation is another virtual dispatch. By introducing the bridge anyway, regardless of the return type, we can save the second virtual dispatch.

Notice that this will need some additional work for multi-project examples, where source may not be available for the methods being overridden: as-is, the Symbol/Binder API will correctly find and identify as the base method for the override chain the most derived (covariant) override. However, if we want to generate the additional bridges, the Symbol/Binder API will need to identify the cases in which the override is "broken" by a covariant override (by the shadow method and shadowing method pair), so that the method generator will be able to generate bridges for each of these points (**).

Experiments with a hand-written IL DLL and version 5 of the C# compiler (VS 2013) show that the presence of the bridge methods in the member list do not have a visible effect on overload resolution and method lookup: even IntelliSense is able to show the right methods.

The last objection/problem arisen in this thread is:

There is no way using reflection to identify that string Bar.M() overrides object Foo.M().

This is the same problem the Symbol/Binder API has to deal with for (**); maybe the code to identify a chain of inherited virtual methods with covariant return types can be inserted in Microsoft.Csharp.dll as an extension method for System.Reflection.MethodInfo? (Just an idea, I don't know which is the policy in cases like this).

ldematte commented 8 years ago

Very long, I know.. please be patient and take your time to comment. In particular, am I missing any important point, or am I on the right track?

If everything sounds good, I would like to start playing around with the compiler code, maybe starting from the tests. Christmas holidays are near, and I will finally have some time to spend on fun stuff :)

gafter commented 8 years ago

Overall this looks awesome. A couple of minor comments:

This proposal covers only overridden inherited virtual methods, not "new" or overridden abstract methods.

I would expect new and overridden abstract methods to be supported scenarios for the new covariant rules.

In your first example, I can't make sense of what you intend to be allowed in class C because it is extending a class with errors.

You don't need to generate a different name in IL for the sealed private override of the base class method (your B_M'), as they have a different "signature" from the CLR's point of view. I think the reflection experience at runtime will be better if they have the same name.

Anyway it sounds like you're definitely on the right track!

ldematte commented 8 years ago

I was very undecided if new or abstract should be supported or not; I was thinking that maybe they were out of scope. No problem, I will rewrite the first few examples including new and abstract, and how they can be supported (and fix the example with C at the same time).

You are right, I may not need to generate a different name for B_M' (let me try it, just to be sure), but I do need a different name for the override to the "real" base method (number 5. in the list, or M1_hidden_2() in the first example, C_A_M() in the second one), otherwise I will get a TypeLoadException (It seems that the CLR sees a chain A::M -> B::M -> C::M, where B::M is sealed, even if the chain is really A::M -> B::M plus A::M -> C::M). Changing name fixes this. I am pretty sure of this (it is the reason why I used different names for the shadows), but let me check this again: I will run my experiments (and upload the relevant IL and C# as gists), just to verify it.

ldematte commented 8 years ago

Ok, I re-run my experiments on overriding and that's what I found:

you never need to generate a different name for the IL to compile (assemble?): any combination I tried makes ilasm happy
I have not tested with PEverify (will do)
for the basic scenario (classes A and B, B:A, covariant override in B) you may not need a new name but you cannot change accessibility for the shadow method. Inside B
- .method private hidebysig virtual final instance object M() cil managed .override N.A::M (line 71 in the attached example) will generate a System.TypeLoadException at runtime (Message: An unhandled exception of type 'System.TypeLoadException' occurred in mscorlib.dll Additional information: Derived method 'M' in type 'N.B' from assembly 'Project3, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null' cannot reduce access.)
- .method public hidebysig virtual final instance object M() works
- .method private hidebysig virtual final instance object B_A_M() works too
for the advanced scenario (classes A, B and C, C:B and B:A, covariant override in both B and in C, with a bridge directly from C::M to A::M ) you need a new name for both bridge (shadow) B::M and bridge C::M (lines 71 and 116):
- If you just change the name for C::M, you will run into a System.TypeLoadException (Message: An unhandled exception of type 'System.TypeLoadException' occurred in mscorlib.dll Additional information: Declaration referenced in a method implementation cannot be a final method. Type: 'N.C'. Assembly: 'Project3, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null'.)
- If you just change the name for B::M, you will run into the same exception
- if you change the name of both bridges (say, B_A_M() for B::M() and C_A_M() for C::M()), it will run as expected.

At this point, I would say it is better to generate new names in all cases: I think it is better to make the bridges private, and also for extensibility (if B is compiled in an external assembly, when I implement C I cannot change B::M bridge name anymore, so I would not be able to generate a bridge from A::M to C::M)

Links: https://gist.github.com/ldematte/aa4b101c8026ef24233f

Am I missing something? I would have agreed with @gafter, based on specs alone, but the current .NET runtime seems to perform checks based on functions names. Or do you spot a problem with my IL?

ldematte commented 8 years ago

Code to test it is trivial:

  using System;
  using N;

  static class Program {
     static void Main() {
        A x = new C();
        var y = x.M();

        Console.WriteLine(y);
        Console.ReadKey();

     }
  }

Compile IL with ilasm Project3.il /dll /pdb, CS file with csc example.cs /r:Project3.dll

Peverify acts like the runtime, even if it is less informative: [token 0x02000003] Type load failed. [token 0x02000004] Type load failed.

gafter commented 8 years ago

You've convinced me that the bridge methods need different names, and that it is useful to make them private.

ldematte commented 8 years ago

Revised new and abstract rules (it also fixes the problem with the first example). Comments are welcome: I am especially doubtful about A.G (and A.I). I don't see the point in the first case, and in the second case I don't know if we should enforce the type used in the new private method (but why, since it is hidden?) or not, like for A.I

This proposal covers overridden inherited virtual methods, "new" and overridden abstract methods.

Examples:

class A
{
    public virtual object E() { return "A.E"; }
    public virtual object F() { return "A.F"; }
    public virtual object G() { return "A.G"; }
    public virtual object H() { return "A.H"; }
    public virtual object I() { return "A.I"; }
}
class B: A
{
    new public virtual object E() { return "B.E"; }      // Ok, hides A.E
    new private object F() { return "B.F"; }             // Ok, hides A.F within body of B
    new private string G() { return "B.G"; }             // Ok, hides A.G within body of B
    new public virtual IComparable H() { return "B.H"; } // Ok, hides A.H with a covariant return type
    new private string I() { return "B.I"; }             // Ok, hides A.I within body of B
}
class C: B
{
    public override object E() { return "C.E"; }    // Ok, overrides B.E
    public override string F() { return "C.F"; }    // Ok, overrides A.F with a covariant return type
    public override string G() { return "C.G"; }    // Ok, overrides A.G with a covariant return type
    public override string H() { return "C.H"; }    // Ok, overrides B.H with a covariant return type
    public override object I() { return "C.I"; }    // Ok, overrides A.I
}

class Test {
    static void Main() {
        C c = new C();
        A a = c;
        B b = c;
        Console.WriteLine(a.E()); 
        Console.WriteLine(b.E()); 
        Console.WriteLine(c.E());

        Console.WriteLine(a.F()); 
        Console.WriteLine(b.F()); 
        Console.WriteLine(c.F());

        Console.WriteLine(a.G()); 
        Console.WriteLine(b.G()); 
        Console.WriteLine(c.G());

        Console.WriteLine(a.H()); 
        Console.WriteLine(b.H()); 
        Console.WriteLine(c.H());

        Console.WriteLine(a.I()); 
        Console.WriteLine(b.I()); 
        Console.WriteLine(c.I());           
    }
}

The expected output is:

A.E
C.E
C.E
C.F
C.F
C.F
C.G
C.G
C.G
A.H
C.H
C.H
C.I
C.I
C.I

Method E follows the existing rules (obviously) in A through C; method F has a covariant return type on C but continues to follow the expected behaviour (hiding in B is only internal); method G behaves like F, but it allows the hidden method in B to have a covariant return type; method H, like E, follows the intuitive behaviour, allowing covariant return types on both the new virtual method in B and its override in C. Finally I shows how the new private method in B does not restrict the return type for the override in C.

An abstract method declaration is permitted to override a virtual method. In the example

using System;
class A
{
    public virtual object F() {
        Console.WriteLine("A.F");
        return null;
    }

    public virtual object G() {
        Console.WriteLine("A.G");
        return null;
    }
}
abstract class B: A
{
    public abstract override object F();      // Ok
    public abstract override IComparable G(); // Ok, covariant return type
}
class C: B
{
    public override string F() {              // Ok, covariant return type
        Console.WriteLine("C.F");
    }

    public override string G() {              // Ok, another coavariant return type
        Console.WriteLine("C.F");
        return "C.F";
    }
}

class A declares a virtual method, class B overrides this method with an abstract method, and class C overrides the abstract method to provide its own implementation. B.G asks its implementors to implement a stricter return type; also, C.G is an override of a virtual method, and we can therefore apply the new rule, and allow for a covariant return type.

paulomorgado commented 8 years ago

How would this behave if the overridden methods are form interface implementations?

Given:

public interface I
{
    object M();
}

public abstract class A : I
{
    public abstract object M();
}

public class B : A
{
    public override string M() => "B.M";
}

what would happen here?

I o = new B();
var x = o.M(); // ???

ldematte commented 8 years ago

TL;DR if there is no "IL override ({ .override), we just emit two methods, one with the original signature, and another with the covariant return type.

@paulomorgado, thanks for bringing it up; I was thinking about interfaces and abstract classes as well. Let's make two examples:

  public interface I {
     object F();
  }

  public class C : I 
  {
     public string F() {
        return "C.F";
     }
  }

  public abstract class A : I {
     public abstract object F();
  }

  public class B : A {
     public override string F() {
        return "B.F";
     }
  }

The behaviour will need to be "as you expect", i.e.

C c = new C();
string y = c.F(); // Ok, compiles
I o = c;
var x = o.F();

Or

B b = new B();
string y = b.F(); // Ok, compiles
I o = b; 
var x = o.F();

Compile-time (inferred) type of x will be object, and at runtime it will contain "B.F" (or "C.F").

This case is actually simpler: in the first case, it is just a shortcut for a case already dealt with in the compiler (explicit interface implementation). In C#:

  public class C2 : I {         

     public virtual string F() {
        return "C.F";
     }

     object I.F() {
        return this.F1();
     }
  }

in IL

.method public hidebysig newslot virtual instance string F() cil managed .method private hidebysig newslot virtual final instance object N.I.F() cil managed { .override N.I::F

Very similar to what we are doing for the shadow/shadowing method pair.

The second case is also quite simple: this time we rely only (and heavily) on the CLR overload definition pointed out by @gafter, where you can have methods with the same name but different return types. When you implement (not explicitly) an interface or abstract method, in IL you do not override, it is just: .method public hidebysig newslot virtual final instance object F() cil managed (for public object F()) or .method public hidebysig newslot virtual instance object F() cil managed (for public virtual object F())

The IL code for B: A will therefore contain two methods:

.method public hidebysig virtual final instance object F() cil managed .method public hidebysig newslot virtual instance string F() cil managed

Or, using C#-pseudo keywords

 public class B : A
 {
    public sealed /*override*/ object F() { return this.F(); }
    new public virtual string F() => "B.F";
 }

Notice that override is commented, since there is no real override (not as IL/the CLR sees it) here. Link to the example (hand-modded) IL: https://gist.github.com/ldematte/39b9f3119a92dc058799

The IL compiles and peverifies without problems, and it is usable from C#. As always, any comment is welcome! (I am not sure, for example, that the first use case of the first example is necessary, since it is already covered by explicit interface implementation. Thoughts?)

gafter commented 8 years ago

This all looks good to me.

ldematte commented 8 years ago

Great, so as a next step I will start to add unit tests for all the discussed examples, plus some negative ones. Have a nice weekend!

MichalStrehovsky commented 8 years ago

Is supporting this for properties still on the table? The original proposal has it, but it didn't get much attention in the implementation plan.

I can see how this can get awkward for setters, but this can be really useful for get only overrides.

dotnet / roslyn