dotnet / csharplang

The official repo for the design of the C# programming language
11.53k stars 1.03k forks source link

feature request: "this" / "self" in static methods to access current type aka late static binding #2841

Closed procodix closed 4 years ago

procodix commented 5 years ago

Couldn't find an open issue regarding this, but the feature is too important to not be reopend at least every month until implementation ;-) I am sure, this has been dicussed previously.

I know that when you derive a class containing static methods, calls to the child's static methods are rerouted to the parent class' implementation. But this behaviour as standard is just so wrong in many ways. It should at most be optional.

Factory methods are not inheritable:

public class A {
    public static A Create() {
        return new A();
    }
}

public class B : A {}

B.Create() creates an "A" which is completely wrong from an inheritance perspective and the opposite of what a non-static method would do.

Instead there should be a "this" or "self" keyword referencing the static type at call time - not the declaring class.

public class A {
    public static self Create() {
        return new self();
    }
}

public class B : A {}

The worst possible naive implementation in the compiler would be to copy the method over to all derived classes, replacing A with B, but there is a smarter way for sure.

Trivia: The PHP folks needed several years to understand the concept of Late static binding. Until then, only self:: existed, which really worked like an alias for CLASS, the declaring class. Only after lots of pressure did they come up with static::, which resolves to the currently called class and behaves correctly as described above.

CyrusNajmabadi commented 5 years ago

B.Create() creates an "A" which is completely wrong from an inheritance perspective and the opposite of what a non-static method would do.

How would you envision this working? When B.Create is called, it is statically dispatching into A.Create. So from the perspective of A there is no way to know what value shoudl actually be instantiated.

procodix commented 5 years ago

Why not? The code clearly says B.Create()

So I would expect the compiler to see B as current static type, calling it's method B.Create() which it doesn't implement on it's own, but inherited from A. Meaning, that every occurence of "self" in the implementation gets assumed as a B type.

BTW, self as return type implies covariance :-)

CyrusNajmabadi commented 5 years ago

Why not? The code clearly says B.Create()

Because that's what's in the IL. There is no 'Create' method in 'B', and A.Create has no clue that B even exists (it might be in a different assembly altogether. As A.Create is a static method, it is passed in no information about how it was called.

So I would expect the compiler to see B as current static type

Theres' no concept of 'current static type'. If you look at the IL actually generated here for B.Create you'll see that B never is even referenced.

calling it's method B.Create()

To clarify that. "it's" implies that B has a Create method. It does not. A has hte Create method, and the language just lets you find it by saying B.. But what it finds is really A.Create which you'll see is what it emits an invocation of.

Meaning, that every occurrence of "self" in the implementation gets assumed as a B type.

How do you envision this working?

procodix commented 5 years ago

So in the compiled assembly we only get an A.Create() method. FIne.

My simplest (and naive) approach would be the following procedure during compilation:

If I had wanted A.Create() I would have written A.Create(), but I wrote B.Create() and that is in the source - thus should be interpreted as outlined. B.Create() calling some A.Create() is meaningless.

procodix commented 5 years ago

To clarify that. "it's" implies that B has a Create method. It does not. A has hte Create method, and the language just lets you find it by saying B.. But what it finds is really A.Create which you'll see is what it emits an invocation of.

Just 2 cents to this "finding" routine. It's useless, because it tries to solve a problem that is none. If I need A.Create() I write A.Create().

CyrusNajmabadi commented 5 years ago

As B does not have a Create() method, copy and paste it from A to B. Replace "self" placeholder with B and emit the thing as B.Create() IL.

When compiling B (or someone calling B.Create) , the compiler may have not have any access to the IL of 'A'. For example, there "ref assemblies" are a thing that exists where the body of A.Create is not included in A.dll. Instead, only the signatures are in there, and the compiler only uses the ref-assembly to make sure what B.dll is calling exists.

CyrusNajmabadi commented 5 years ago

Just 2 cents to this "finding" routine. It's useless, because it tries to solve a problem that is none. If I need A.Create() I write A.Create().

We cannot relitigate the past. This has been the specified behavior for C# since 1.0.

CyrusNajmabadi commented 5 years ago

If I had wanted A.Create() I would have written A.Create(), but I wrote B.Create() and that is in the source - thus should be interpreted as outlined.

Changing the interpretation of existing code can lead to breaking changes. Something that will not generally happen.

B.Create() calling some A.Create() is meaningless.

It's had meaning for nearly 20 years now :)

procodix commented 5 years ago

The feature request allows for clear, intuitive static method inheritance. Something C# is missing. I think, I elaborated on your question how I would envision this to work. There is no point striking out the current mechanics of IL, because they are, at least here, well, insuficcient. So lets work on my request's metadata point-by-pint.

cannot relitigate the past.

That's what a feature request is good for. C# is a product of the past, it shouldn't be a prisoner of it.

This has been the specified behavior for C# since 1.0. [...] It's had meaning for nearly 20 years now

If the founding fathers had been perfect we wouldn't have version 8.0 now ;-) But I think you agree to my last point. The current behaviour that reroutes B.Create() to A.Create() obviously reveals "some" functionality, which however is completely redundant and therefore unneccessary. One could say more politely: it has little use ;-)

can lead to breaking changes.

Thats completely preventable. A new keyword "self" completely separates classic functionality from the modern one. Therefore no breaking change occurs whatsoever. If a method is implemented without "self" inside, everything stays as it is now - for the old folks. However a use of "self" lets the suggested logic kick in.

Something that will not generally happen.

Fortunately nothing I requested ;-)

As A.Create is a static method, it is passed in no information about how it was called.

Thats why I suggested, that the compiler copies the method over to B and then calls B.Create(). Granted, that workflow is not the most efficient one but it would get the job done. Rocket scientist will come up with a better idea when covariance gets implemented.

Haven't seen a reason why this shouldn't work, hm?

CyrusNajmabadi commented 5 years ago

That's what a feature request is good for.

Feature requests that don't change the meaning of existing code: great. feature requests that change the meaning of existing code: nearly insurmountable.

Just letting you know this :)

C# is a product of the past, it shouldn't be a prisoner of it.

You're welcome to want htat. I'm just explaining that that will make the chance of your proposal happening orders of magnitude less likely. To give you an idea, C# has only ever had about 1 real breaking change in behavior at this level.

CyrusNajmabadi commented 5 years ago

There is no point striking out the current mechanics of IL, because they are, at least here, well, insuficcient.

Right. But that's why i'm asking you to explain how it would work. Because any proposal suggesting this sort of change will need to explain how it can actually work in the existing ecosystem.

So, for example, a propsal that says "the compiler needs to be able to access the IL to copy it over to the new dll" won't really work, since in the existing ecosystem tehre exist tons of DLLs that do not ship with IL to copy :)

theunrepentantgeek commented 5 years ago

Delphi's Object Pascal language had something like this - it didn't have static methods, it had class methods that could be called without an instance of the class. They could be virtual (and thus be overridden by subclasses).

You could even pass them around - I have vague memories of methods accepting the TComponent class that used the class reference to create instances (factory pattern).

If something like this feature were to progress, it would have to leave all the existing static declarations untouched; breaking tens of thousands of existing projects just won't fly.

But a new method type - perhaps (re)using the Delphi keyword class - might just work.

The crux will be the cost/benefit ratio - remembering that every language feature starts, not at zero, but at -100 points.

AartBluestoke commented 5 years ago

could this work as generic do at an IL level - a constrained parameter passed in?

that would make this compile to something like what static T create\<T>(...) where T:typeof(caller),A would be ?

so B.Create() is a call to A.Create \<B>(....)

the use 'self' keyword would limit compiletime access via sourcecode to this function; not sure how the interop would end up working ....

ufcpp commented 5 years ago

https://github.com/dotnet/csharplang/issues/252 ?

clystian commented 5 years ago

@ufcpp can you comment about why #252 is related to this, some example of code maybe?

procodix commented 5 years ago

252 sound similar but does not focus on static methods.

procodix commented 5 years ago

could this work as generic do at an IL level - a constrained parameter passed in?

1) This whole thing can be circumvented, if I used instances members. They inherit properly. Buit sometimes you don't want to instantiate an object, because you just need some information on the class - not the object. Chainable Factory methods as in the example arre a second reason.

2) Generics shouldn't be used for this for at least two reasons:

TL,DR: yes, but they do it way too complicated. Feature request is plain and simple.

procodix commented 5 years ago

Thanks, but let me repeat clearly: There is no breaking change and there is no language change requested here. We are talking about a new keyword with som compiler magic taking place. C# had plenty of changes like this ("await").

No need to wake a sleeping giant. We are NOT talking about redefining + operator to multiply fom now on.

You're welcome to want htat. I'm just explaining that that will make the chance of your proposal happening orders of magnitude less likely. To give you an idea, C# has only ever had about 1 real breaking change in behavior at this level.

huoyaoyuan commented 5 years ago

We are talking about a new keyword with som compiler magic taking place.

How would the compiler implement it? The compiler event has no knowledge about what A.Create() does. So it cannot decide what B.Create() does when applying any magic.

a static method should inherit from it's single defined parent.

No, static methods are never inherited. It just has accessibility associated to class, like nested types.

If you want static method to have some dynamic/polymorphic semantics, it will falls into type classes(#110).

CyrusNajmabadi commented 5 years ago

Thanks, but let me repeat clearly: There is no breaking change

Then how does your feature work? If i have static method A.Create() how can it end up dynamically doing things differently depending on who called it? Literally by what mechanism would that work?

and there is no language change requested here.

If there's no language change requested... then what are you asking for? :)

CyrusNajmabadi commented 5 years ago

We are talking about a new keyword with som compiler magic taking place.

That is the very definition of a language change :)

C# had plenty of changes like this ("await").

'await' does not change how the method executes depending on how the caller calls it. Furthermore, the callee doesn't affect the caller eithe with async/await.

Here's a good way to tell:

When you have an async Task method that gets compiled into a dll into a method that just returns Task. There is nothing in the dll that indicates that either async or await was used. If someone calls this, they'll have no idea how it was implemented. The callee can change their impl at any point.

--

That's not how your feature here works. You're requiring that the caller have to know how the callee was implemented in order to figure out what to do. Namely the caller (B.Create()) would have to know about A.Create() so it could somehow "copy the IL" and make a suitable version of it that then instantiated a new B instead of a new A.

CyrusNajmabadi commented 5 years ago

@procodix before you proceed, let's start simple:

Say you have this code:

public class A {
    public static self Create() {
        return new self();
    }
}

How would you propose this actually be encoded in IL? Feel free to actually just synthesize what you'd want the C# compiler to emit for something like this. If you don't want to write IL, just write the equivalent C# code this would transform into.

Note: i don't even care about the caller right now of B.Create (though you can add that information if you want). I literally only care how A.Create will actually be compiled.

canton7 commented 5 years ago

I still don't understand the premise. Given self, what would you actually be able to do with it? The only thing you know at compile-time is that it's derived from A. You can't do new self() because it might not have a parameterless constructor (meaning that your original example is already impossible, regardless of any other considerations). You can't call any static methods which aren't already on A.

Maybe I've missed something, but I can't find any other examples of usage.

Could you give some examples of what this would be used for which actually make sense at a language level, regardless of backwards compatibility and implementation?

procodix commented 5 years ago

That is the very definition of a language change :)

Call it as you want, but this has happend more than once to the language. It's not Halley's Comet. You can't have it both ways ;-)

So, for example, a propsal that says "the compiler needs to be able to access the IL to copy it over to the new dll" won't really work, since in the existing ecosystem tehre exist tons of DLLs that do not ship with IL to copy :)

Unless someone builds a time machine, the self keyword and the connected logic is not present in any DLL. And it will never have to be. It's a compiler feature the works during compilation not during execution. So foreign DLLs will behave completely deterministic.

Namely the caller (B.Create()) would have to know about A.Create()

During compilation the caller is known, because it's written down there. So what?

CyrusNajmabadi commented 5 years ago

During compilation the caller is known, because it's written down there. So what?

The caller is known. The callee is not. All you know is that there is a static, parameter-less Create method in A.

--

But, again, can you just answer: https://github.com/dotnet/csharplang/issues/2841#issuecomment-536901880

What would you actually compile this into?

CyrusNajmabadi commented 5 years ago

Call it as you want, but this has happend more than once to the language

When has it happened before? You mentioned async/await but they definitely did not do anythin akin to what you're describing here. Like i explained in https://github.com/dotnet/csharplang/issues/2841#issuecomment-536901266 all that stuff is completely transparent to the caller.

procodix commented 5 years ago
public class A {
    public static self Create() {
        return new self();
    }
}

becomes

public class A {
    public static A Create() {
        return new A();
    }
}

so nothing changes.

However when class A : B {...} gets declared following method gets injected:

public class B {
    public static B Create() { // thus covariance during inheritance is important!
        return new B();
    }
}

It works the same way as with instance methods.

procodix commented 5 years ago

The caller is known. The callee is not. All you know is that there is a static, parameter-less Create method in A.

Sorry, I meant the callee. The parser reads "B".Create() So "B" is callee. I don't understand your point. Somewhere your "find" procedure has to start to search for a metod to call. Where does it start? When it reads "B.Create()" it known, that the method should be called on B. As it does not exists, the search starts up the ancestors an reveals that parent A has a .Create()) method to which the call is exchanged. Skip the exchange. Call .Create() on B.

canton7 commented 5 years ago
public class A {
    public static self Create() {
        return new self();
    }
}

What if you then have:

class B : A
{
    public B(string something) { }
}

How would B.Create() work then?

procodix commented 5 years ago

Could you give some examples of what this would be used for which actually make sense at a language level, regardless of backwards compatibility and implementation?

There are plenty of use cases. Factory methods for example. A needs some methods to create an instance from itself. B&C&D inherit from A. I don't want to copy the factory methods over. With inheriting the static members, I can simply reuse them. Thats the point of OOP.

Another usage is sgtoring information about a class not an object inside the class. You could use attributes, but they are readonly. Take for example an ORM class. It has a method that returns its database table name

namespace Test {
    class A {
        public static string Table() {
            return self.FullName.Replace('.', '-'); // returns "Test_A"
        }

        public void Save() {
            Database.SaveToTable(self.Table())
        }
    }

    class B : A {

    }
}

B.Table(); // returns "Test_B"
B.Save(); // writes to the correct table
canton7 commented 5 years ago

In your last example, where is FullName declared?

As I said earlier, factory methods won't work without some guarantee that all subclasses of A will have parameterless constructors. Given that they need to have parameterless constructors (and ensuring that has its own can of worms), it's not clear what you factory will do.

procodix commented 5 years ago

@canton7: How would B.Create() work then?

B.Create() calls A.Create()'s implementation, thus executing:

       public static self Create() {
        return new self();
    }

This calls the C# default constructor and returns an empty B().

It should be allowed to override B.Create() as other languages do it by coding:

class B : A {
    public static self Create() { // self is assumed as current Type = B()
    self instance = super.Create(); // calling inherited but inivisble B.Create() which calls A.Create() and returns a B().
    instance.SetSomeProperty(true);
    return instance; // returns B() or child, if inherited later
   }
}
canton7 commented 5 years ago

This calls the C# default constructor and returns an empty B().

Please read the bit I wrote before I said "How would B.Create() work then?". There, I declared B as having a non-parameterless constructor.

procodix commented 5 years ago

In your last example, where is FullName declared?

This comes from self being a Type instance.

canton7 commented 5 years ago

In your last example, where is FullName declared?

This comes from self being a Type instance.

If self is a Type instance, you can't do new self().

procodix commented 5 years ago

There, I declared B as having a non-parameterless constructor.

B inherits from A, so A would require a non-parameterless constructor as well. Then A.Create() would have required to call this non-parameterless constructor:

public class A {
       public static A(string aParameter) { ... }

    public static A Create() {
        return new A("a string parameter");
    }
}
canton7 commented 5 years ago

B inherits from A, so A would require a non-parameterless constructor as well.

That isn't how C# works. This is valid C#:

class A { }
class B : A
{
    public B(string s) { }
}

This is also valid C#:

class A
{
    public A(string s) { }
}
class B : A
{
    public B() : base("Foo") { }
}

And so is this:

class A { }
class B : A
{
    private B() { }
}

A derived class's constructor can have more parameters than its parent's constructor, or fewer parameters, or it can stop things constructing it at all (by making its constructor private).

procodix commented 5 years ago

If self is a Type instance, you can't do new self().

Yes, this is syntactic sugar. Most times self would be best represented as a Type in return or parameter statements, as well as when reading out the Type's name as demonstrated.

Placed after new it should obviously lead to IL code creating an instance of that type, similar to what Activator.CreateInstance(self.Fullname) would do.

procodix commented 5 years ago

@canton7 finding the right constructor has some logic now. Classes have empty constructors or by overloading parametrized ones. The compiler can resolve this at compile time. When I call B.Create() and that leads to A.Create() which hosts some instructions that can not (or no longer) be executed on B, this means a syntax error.

Here an example, of how this is communicated today:

class Uri2 : Uri {
    public Uri2() {}
}

leads to the error: 'Uri' does not contain a constructor that takes 0 arguments (CS1729)

procodix commented 5 years ago

@canton my Uri example would get resolved by adding:

class Uri2 : Uri {
    public Uri2() : base("http://localhost") {}
}

to satisfy the requirements. That's what A & B programmers would have to do as well.

canton7 commented 5 years ago

Let's say you have:

class A
{
    public static self Create() => new self();
}
class B : A
{
    public B(string s) { }
}

So your proposal is, if you call B.Create(), the compiler would peer inside the implementation of A.Create() (which might not be available at compile-time), determine that it tries to (effectively) call new B(), and would raise an error on B.Create()?

procodix commented 5 years ago

Just wanted to emphasize, this is not theory. It works very well in PHP and is very powerfull in the use cases demonstrated above. There is static:: which always represents the current class'es name as string. For example this is pretty neat & reusable code for a trait - something like the new interfaces with default implementations which get injected into classes:


trait InstanceTrait {
    public static function Instance() {
               return new static; // creates instance of whatever class this is called from
        }

    public function ToString() {
             return 'Instance of : ' . static; // even works in instances to print the current class name
        }
}
canton7 commented 5 years ago

PHP can do many things that C# cannot. C# can do many things that PHP cannot.

PHP is late-bound, C# is early-bound. C# gives many more compile-time guarantees than PHP does, but the cost is less flexibility at compile-time.

You're trying to take something which works in PHP because PHP is late-bound, and applying them to C#, which needs to give many more compile-time guarantees. There are problems there, and we're asking how your proposal addresses them.

canton7 commented 5 years ago

Also, it really does sound like generics already meet your use-case:

class Factory
{
    public static T Create<T>() where T : A, new() => new T();
}

Factory.Create<A>();
Factory.Create<B>();

If you really want your nice syntax, then:

class A
{
    public static A Create() => Factory.Create<A>();
}
class B
{
    public static B Create() => Factory.Create<B>();
}

You can also do typeof(T) to get the Type instance.

Given that the language already appears to support what you want, with only a couple of additional lines of boilerplate, I think you're going to have a really hard time convincing the LDM to support this particular proposal.

procodix commented 5 years ago

@canton7

So your proposal is, if you call B.Create(), the compiler would peer inside the implementation of A.Create() (which might not be available at compile-time), determine that it tries to (effectively) call new B(), and would raise an error on B.Create()?

Exactly. Since this is clearly a compile time error.

BTW. You are absolutely right about the different worlds PHP and C# live in. Mentioning it didn't mean to ignore the conceptual differences, its just proof-of-concept that there is a viable path to getting to an executable method in all cases when static:: is used. The implementation detail in PHP was named "Late static binding", but in PHP everything is late, because it is interpreted, most code is even included at runtime. They may call it "Late static binding" because they carry the latest class name on which a static function was called around throughout the stack. But that leads to confusion when a static method jumps into a dynamic one of another class. Which class does static:: represent then?So the naming is more an explanation for the bad implementation.

C# however due to strong typing could realize it at compile time as pointed out earlier. If a method is missing, simply throw a compile error.

Swift does it as well with "self" and "Self", the latte being for static:: See here: https://kirilltitov.com/en/blog/2017/capitalized-self-in-swift

However all questions asked so far put the spotlight on the central instruction: What happens at B.Create()? And that was my inital point: Its written in the source code, so it should be called If it does not exists => error If it exists in parent class => copy it over, do the type mangling and call it.

canton7 commented 5 years ago

So your proposal is, if you call B.Create(), the compiler would peer inside the implementation of A.Create() (which might not be available at compile-time), determine that it tries to (effectively) call new B(), and would raise an error on B.Create()?

Exactly. Since this is clearly a compile time error.

The problem here is that this is simply not possible. As @CyrusNajmabadi said earlier, A might be in a reference assembly, which means that its implementation simply isn't available to the compiler.

If it exists in parent class => copy it over, do the type mangling and call it.

Again, if A is in a reference assembly, there's nowhere to copy it from!

Also, simply copying A.Create's implementation into B would means that if B doesn't have a parameterless constructor (or a constructor which matches A's constructor), you would get a compiler error at the point that you try to compile B. This is different to the behaviour you just said you wanted, which is that the compile-time error happens at the point that you call B.Create().

It's really hard to take a proposal seriously when the proposer keeps contradicting themselves on what they're proposing.

procodix commented 5 years ago

So the only problem arises, when the code is in two different assemblies?

canton7 commented 5 years ago

There are plenty of other problems, but let's take one at a time.

procodix commented 5 years ago

Ok, I am not too familiar with the IL specifics. But we could try another apporach:

What about the call to B.Create() would instead

That would prevent the type mangling at compile time / at IL level. It would require the compiler to emit additional Opcodes after the return jump and stack pop to cast the returned "self" object to the provided classname "B".

Something like this:

public class B : A {
    // automatically generated method:
    public static B Create() {
        return (B) A.Create();
    }
}
canton7 commented 5 years ago

You cannot take an instance of A and then cast it to B. That's not possible.

For an intuitive idea why: B might have fields / virtual methods which A does not have. When you do new A(), you allocate enough space for A's fields, but not B's. If you were then able to cast that instance of A to B, there would be nowhere for B's fields to be stored.