dotnet / csharplang

The official repo for the design of the C# programming language
11.53k stars 1.03k forks source link

Discussion: Stack stored classes, Heap stored "value classes" #460

Closed lachbaer closed 7 years ago

lachbaer commented 7 years ago

For heap stored "value classes" please see https://github.com/dotnet/csharplang/issues/460#issuecomment-296102185

Question

Why is it not possible to put classes on the stack? Value type structs do not offer the comfort of classe swhen it comes to inheritance.

In C this is possible by ommiting the new keyword.

I can think of the reason, that accidentially ommiting new and creating a stack stored instance without any further notice can lead to unwanted behaviour. Also when creating the (first) CLR, strictly distinguishing between stack stored struct and heap stored and garbage collected class might be easier.

Motivation

There is ongoing development with (non-)nullable reference types, et alta. It seems that the borders between value types (struct) and reference types (class) partly obliberate nowadays.

I recently had scenarios where putting a class type on the stack to accelerate processing would be quite useful.

Possible syntax

To create a class instance on the stack the following statement could be used, where new is replaced by struct. ClassBar# marks innerBar as being not a pointer.

ClassBar# innerBar = struct ClassBar();

To (flat) copy a heap stored instace to the stack, a struct() operator can be introduced

var refPoint = new ReferencePoint(0.0, 0.0);
var pointOnStack = struct(refPoint);

Implementation possibilities

Alternatives

Maybe the practical performance impacts nowadays aren't as heavy, so that leaving classes on the managed heap is actually no issue any more.

lachbaer commented 7 years ago

And I do not see how emoticon reactions and unargumented statements like the the one above answer the quesion and lead to a serious discussion! 😞

I ask a further question, and I would be lucky to get real answers on that!

How high are the costs of creating a class object on the managed heap, instead of creating an identical struct instance on the stack?

HaloFour commented 7 years ago

This would most probably require a CLR change

This would certainly require a CLR change. Even C++/CLI and MC++ are unable to allocate .NET classes on the stack.

For reference:

https://github.com/dotnet/roslyn/issues/2104 https://github.com/dotnet/coreclr/issues/1784 http://xoofx.com/blog/2015/10/08/stackalloc-for-class-with-roslyn-and-coreclr/

lachbaer commented 7 years ago

@HaloFour Thanks for the valuable links 😃

On my processor the runtime difference of an optimized compilation between allocating a class (without any field initializations) and an identical struct is between 10:1 and 20:1.

lachbaer commented 7 years ago
ClassBar firstBar = new ClassBar();
ClassBar #secondBar = new ClassBar();

When the compiler reaches the second new ClassBar() it must "look back" for what memory type it should create the object. Also you can accidentially omit or delete the # and get a different result than wanted. Therefore I think struct ClassBar() is better in terms of clearness (also for the compiler) and stability.

I don't see your point on nullables. Nullable<T> is a value type on the stack and can be null. # marked variables must never and cannot be null.

HaloFour commented 7 years ago

Stack allocation is certainly much faster. But it comes with a lot of limitations, especially if the C# compiler is to attempt to enforce safe memory management. I think that the cases that this feature could be applied would be incredibly limited as a result. You couldn't take an arbitrary class and just "pin" it to the stack. The class would likely have to meet a very strict set of guidelines that would have to be enforced by the CLR. For example the class might have to be sealed to prevent inheritance which would prevent the compiler/JIT from ensuring that enough space is allocated on the stack.

Then you have the entire problem of escape analysis. That reference can't go anywhere since it points to a portion of memory that is unmanaged. I'm not entirely sure how that can be adequately handled since effectively calling any instance method on the reference is a form of escaping since those methods could save this literally anywhere. The blog post above gets into this a little bit as this was already brought up as one of the major hurdles.

gafter commented 7 years ago

@lachbaer Objects on the heap are freed by the GC, which ensures that the object survives until after the point when its last reference is no longer reachable. You have not explained to us how you would accomplish that for objects that are on the stack, or what the behavior should be when there are dangling references. So we cannot answer how expensive it would be.

lachbaer commented 7 years ago

@gafter Objects on the stack should behave like structs, in a way.

What I mean is, that once the stack is cleared, the object is immediately finalized.

To ensure that no dangling references occur, the stack object gets a new type name suffixed with a #. The variable is handled like a struct from that point on, i.e. there can be no further references to it. The instance on the stack is a shallow copy of the heap object (= struct(objVariable)) or a newly created object (= struct Object()).

Also, an "object# obj = struct Object();" is by default converted to a sealed type. It must not be cast to any other type in the hierarchy. The moment that it is cast to (object) or an explicit interface type it is "boxed" again, like any value type, and by that regains all its powers.

The object on the stack is not a reference, it is a copy. It's like int? a = b with int b, where int? is a new type, but now with the type modifier #. Boxing by assigning the stack object to an object obj again makes a copy makes of the stack object on the heap and keeps the stack object on the stack until being out of scope.

A valid issue is when the finalizer frees resources, that are pointed to by reference fields of the object. There can be dependencies between the objects that could lead to undecicive behaviour.

HaloFour commented 7 years ago

@lachbaer

The variable is handled like a struct from that point on, i.e. there can be no further references to it.

You have to be able to take references to the copy on the stack in order for instance methods to work.

lachbaer commented 7 years ago

@HaloFour Seems that I have a lack of knowledge here. What is the difference on how instance methods of structs work compared with reference types? You can use this in value type "instance" methods. ❓

HaloFour commented 7 years ago

@lachbaer

In the case of struct instance methods the compiler pushes a ref to the struct onto the stack. That's why struct instance methods can be self-mutating.

lachbaer commented 7 years ago

@HaloFour But that means that the ref to itself lies deeper (physically) in the stack and can only exists when the instance itself exists. The ref will always be purged before the actual object. Then this is no technical issue and you just get confused by the statement "there can be no further references to it"? Or do I oversee something?

mikedn commented 7 years ago

To ensure that no dangling references occur, the stack object gets a new type name suffixed with a #

So you're suggesting that the C# compiler should create a struct type and copy the reference type's fields and methods?

lachbaer commented 7 years ago

@mikedn I think that there must be a CLR change for this and that the CLR does this in the background. The compiler creates the type only as a TypeSymbol.

svick commented 7 years ago

@lachbaer What would happen in the following code?

class C
{
    private static C storage;

    private int state;

    public int M()
    {
        storage = this;
        storage.state = 42;
        return this.state;
    }
}

…

C c1 = new C();
Console.WriteLine(c1.M());

C# c2 = struct C();
Console.WriteLine(c2.M());

As far as I can tell, there are two options:

  1. C#.M is the same method as C.M. When you call it, a reference to the C stored on the stack is passed in, resulting in memory corruption down the line.
  2. C#.M is a copy of C.M, but with semantics changed to be safe (e.g. by copying the contents of this when passed elsewhere). I don't see how could this be done without changing the semantics so much that huge number of methods (like C.M) become broken.

What am I missing? Is there some third option? Or do you think one of the two options above is acceptable?

HaloFour commented 7 years ago

What's more the IL for both the struct and the class implementation of such behavior is different, both in the implementation and the consumption. It wouldn't be possible to treat a class like a struct with the same IL generated for that class.

dstarkowski commented 7 years ago

Objects on the stack should behave like structs, in a way

Is it only in terms of memory or are classes allocated on stack passed to a method by value as well? Can parameter of type ClassBar# be passed to method that accepts ClassBar?

If not you won't be able to use it with any existing code. Otherwise you need to get reference, which you stated shouldn't be possible.

lachbaer commented 7 years ago

@HaloFour

It wouldn't be possible to treat a class like a struct with the same IL generated for that class

Can you submit an example? You are probably right, but I haven't seen solvable differences in the IL. A CLR change must be done for this anyhow.

@svick

What would happen in the following code?

I admit not to think of self-references. 😞 That will crash whatever way I think. But I weren't me, if I hadn't had a solution in my mind 😀 :

The purpose of this construct in the first place is to allow a relevant performance boost for lightweight classes, who are classes and not structs for some reason. Most of those classes I can think of are part of my projects, so I have access to them, or will probably be lightweight classes of the framework, that could be rededicated easily.

If backward compatibility is no issue - because until now we dealt with the current situation somehow - a new "kind" of class can be introduced to the language and CLR, namely class#.

public class# Person { ... }
public class# Student : Person { ... }

Some assurances must be made about this classes, two being

  1. Inheritance can only be made on stackable classes, i.e. other class#'es
  2. this is only allowed on other T# declarations.

This means, that not per se all classes can be put on the stack, but we would have the opportunity to create new classes for which this feature would actually (and only) make sense. (Besides we get the benefit of an implicit "Dispose" by a deterministic called finalizer.)

@dstarkowski

Can parameter of type ClassBar# be passed to method that accepts ClassBar

Yes, it will be shallow copied on the heap, loosing its performance benefit. The method then operates on a copy of the object! But there is a solution to that, too:

void RenameStudent(Student stud) { stud.Name += " the douche"; }
Student# student = new Student#("Eric");
RenameStudent(student ref);

After RenameStudent exits the heaped student is copied back to the stacked student. That is indicated by appending the ref keyword to the argument.

mikedn commented 7 years ago

a new "kind" of class can be introduced to the language and CLR, namely class#

Not "can", "must". There is no other way to ensure that a reference type allocated on stack doesn't use this in an unsafe manner.

Yes, it will be shallow copied on the heap, loosing its performance benefit. The method then operates on a copy of the object! But there is a solution to that, too:

So you're proposing that significant changes are made to the runtime and the language in the name of performance but it all falls flat on its face as soon as the usage becomes slightly more complicated. It's probably cheaper and more effective to let the runtime do escape analysis and stack allocate reference types that do not escape the stack.

lachbaer commented 7 years ago

When looking at all the issues I have posted upon recently - of course including this one 😁 - it kind of seems to me that we/I treat the symptom, not the cause.

With acceptable CLR changes being unavoidable at least thinking about a completely new native type, not being backwards compatible, is acceptable. I'll call it cluct or strass 😆 No, seriously, this time I go with struct# to facilitate keyword reuse.

It will bring several things together

That struct# had basically quite a lot of the classical struct in essence, but loosens many if not all constraints that made me and other participants propose this or that approach to solve a current issue.

Addendum: to facilitate some of the characteristics of that new type, an additional, specialized heap with less CLR overhead can be created in memory. The stack then simply stores the pointer.

HaloFour commented 7 years ago

Sounds like a massive amount of work for very little real benefit.

lachbaer commented 7 years ago

@HaloFour The primary benefit lies in performance. In very many cases you operate on midweight objects that are already too heavy for being structs but don't make much or any use of comprehensive class features. Nevertheless you finally decide for classes.

The CLR is initially designed to support the broad OOP feature of classes, but the costs for that managed behaviour is massive. Performance measures underline that and for some this is the contra-argument to not go with .NET, Java or alike.

With this, admittedly big, addition to the CLR many often used (custom) classes nearly get the performance of structs while still maintaining the most used comfort of classes. The performace jump will be groundbreaking, towards a Pro for .NET into the next decade(s). And also a possible beater against the JVM for .NETCore.

dstarkowski commented 7 years ago

@lachbaer

In very many cases you operate on midweight objects that are already too heavy for being structs

What makes object too big for being a struct, but not too big for stack allocated class?

mikedn commented 7 years ago

The primary benefit lies in performance

And as already stated there are alternatives to this proposal that are likely easier to implement.

HaloFour commented 7 years ago

@lachbaer

That's why we have ref locals and returns now. The size of the struct no longer matters, you don't have to copy it around to work with it.

fanoI commented 7 years ago

I continue to think when I read these genre of issues (stackalloc, destructible types, this "strass") that there is fundamental fallacy: C# is NOT C/C++ and I don't want to have to preoccupy of the memory management!

This is should be the work of the Jitter / AOT compiler doing the correct escape analysis and then allocating objects on the stack only when it is safe to do; the objection that is impossible does not make sense: Java has this from years and the JVM and the CLR are really a lot similar!

Stack analysis will give this all for free:

  1. Classes are allocated into the stack when possible only when strictly needed are on the heap
  2. If one class has been "deboxed" to be a struct its finalizer could be called directly when the stack is unwinden, while the user should write "using var f = new FileStream()" that is converted to a try / catch / finally with an explicit "f.Dispose()" after the escape analysis has realized that indeed 'f' does not escapes can make all the try / catch / finally disappear and call f.Dispose or more correctly yet ~f() when 'f' goes out of scope
  3. A lot of "concerns" on some part of .Net for performances (Linq, Enumerable, ...) will "magically" be gone away in the 99% of the cases they will be allocated on the stack!
  4. Classes that contain other classes will be onto the stack without the need of a reference (pointer) to the other class, no need to indirection if it does not escapes!

In the end the boundary between ValueTypes (technically an "hack") and Objects will become indeed totally nonexistent an hypothetical new .Net could not have this distinction and be a pure object oriented language (as Java would wanted to be but then they feel the necessity of "primitive types" and in some way broken it) in which Integer is an Object but 99% of the times is allocated on the stack as if it was a "native" int.

jnm2 commented 7 years ago

cluct or strass

Please stop with the portmanteaus, lol...

mikedn commented 7 years ago

In the end the boundary between ValueTypes (technically an "hack") and Objects will become indeed totally nonexistent

Not really. For one thing escape analysis is limited in what it can do (e.g. it's very difficult to allocate on the stack objects that are returned). And more importantly, being a value type and being allocate on the stack are independent things.

an hypothetical new .Net could not have this distinction and be a pure object oriented language

Value types have nothing to do with C# being or not being a "pure" OOP language. Not to mention that the idea of a "pure" OOP language is archaic.

lachbaer commented 7 years ago

@mikedn

And as already stated there are alternatives to this proposal that are likely easier to implement.

Can you link them (again), please? 😃

mikedn commented 7 years ago

@lachbaer Quoting myself from couple of posts above:

It's probably cheaper and more effective to let the runtime do escape analysis and stack allocate reference types that do not escape the stack.

lachbaer commented 7 years ago

@mikedn Ah, you ment @HaloFour 's links? dotnet/roslyn#2104 dotnet/coreclr#1784 http://xoofx.com/blog/2015/10/08/stackalloc-for-class-with-roslyn-and-coreclr/

mikedn commented 7 years ago

Those are good too. coreclr#1784 is an add "escape analysis" request for the JIT.

lachbaer commented 7 years ago

This gets me confused a bit 😕 It seems as if the boundaries between classes and structs will perhaps vanish with on of the next CLR updates.

Does that mean that the only left real argument for deciding between struct and class is whether my object shall be by value or by reference for the sake of "copyness"?

mikedn commented 7 years ago

It seems as if the boundaries between classes and structs will perhaps vanish with on of the next CLR updates.

Nope, that's not true.

What escape analysis does is allocate reference types on the stack when the compiler discovers that reference(s) do not escape the stack. That is:

As an example - it should be possible to allocate the List<int> object on the stack in the below example:

void foo() {
    var list = new List<int>();
    list.Add(42);
    PrintListCount(list);
}
void PrintListCount(List<int> list) {
    Console.WriteLine(list.Count);
}

This is, of course, subject to JIT compiler's escape analysis capabilities which may be rather limited. But Java does some of this so we know that it's possible, to an extent.

Does that mean that the only left real argument for deciding between struct and class is whether my object shall be by value or by reference for the sake of "copyness"?

Not really. For example neither your proposal nor escape analysis can deal with returned objects, at least not in a reasonable manner. Because of that people will likely still use struct enumerators like List<T>.Enumerator.

fanoI commented 7 years ago

Well but escape analysis could in some case be helped by another compiler optimization: https://en.wikipedia.org/wiki/Return_value_optimization

For example this List does escape and so should be allocated on the heap:

void foo() {
    var list = new List<int>();
    list.Add(42);
    return list;
}

void test() {
       List l = var foo();
       PrintListCount(l);
}

But after RVO not anymore as it becomes:

void foo(ref List list<int>) {
    list = new List<int>();
    list.Add(42);
    //return list; ==>elided!
}

void test() {
       List l; 
       var foo(ref l);
       PrintListCount(l);
}

Usually the optimization are chained for better results, I think that in reality the occasions in which allocate on the stack should be more than expected.

mikedn commented 7 years ago

But after RVO not anymore as it becomes:

Today ref List<int> is a reference to a reference so the code you show doesn't actually allow you to allocate the list on the stack. The allocation still needs to be done inside foo and storing the resulting reference in list means escaping it so no stack allocation is actually possible.

What you probably have in mind is that List<int> behaves like a struct in this case and that the ref points to the actual storage, storage that's on the caller's frame. There are all sorts of problems with this, for example foo needs to behave differently depending on whether the caller escapes or not the list. This "behaves differently" isn't likely doable so you'll have to generate 2 different versions of foo, not pretty.

whoisj commented 7 years ago

This gets me confused a bit 😕 It seems as if the boundaries between classes and structs will perhaps vanish with on of the next CLR updates.

Does that mean that the only left real argument for deciding between struct and class is whether my object shall be by value or by reference for the sake of "copyness"?

That sounds great to me - well there's the fact that struct cannot have empty c'tors. 😛

Seriously, if classes can be stack allocated then we need a borrow operator, which means there would be a lot of existing API that would be unusable by stack allocated objects.

lachbaer commented 7 years ago

In my opinion as standard programmer who has just basic knowledge of the way the CLR/JIT/AOT behaves should not be forced to chose between returning by return value or calling by reference. Also, in my eyes this already is - to cite @fanoI - "to preoccupy of the memory management". You're gonna choose one pattern over the other. Besides using a ref for only one return value is stated as what shouldn't be done in every (beginners) programming book. And with ref-returns and ValueTuples by our hands now, that is even emphasized and true for multi return values.

But the motive for this topic is to allow "standard" programmers to easily chose between OOP power by using class or to chose performance power by using class# with restricted OOP possibilities, that nevertheless offer more programming comfort than struct does today.

This is no proposal and I don't actually care about how this could be achieved, nor whether there is a real need for a change.

The latest comments just tell me, that the CLR team is putting effort in making classes more effective towards structs. Well, then there is actually no need to put classes on the stack anymore.

The initiative for this discussion comes from #99. It turns out that initializing structs and giving them a custom default is actually not doable, unless a solution is found. My first thought on this was "what if classes can be put on the stack with the same performance?", hence this discussion.

Now however to me, the cat is chaising its tail.

  1. struct does not have the slightest comfort of class
  2. class is not really predictable concerning its performance
  3. how can we offer the programmer a combination of both?
  4. (wtf is struct still good for besides its non-nullness and copy-by-value characteristic?)

Any ideas about how or if?

whoisj commented 7 years ago

@lachbaer you keep asking this question: "wtf is struct still good for?"

imo there should be no difference between class and struct beside the defaults of by-value vs by-reference.

jnm2 commented 7 years ago

@lachbaer Struct is a bag of values contained by some class or variable with no concept of unique identity. Class has concept of unique identity.

lachbaer commented 7 years ago

@jnm2 Isn't that a theoretical POV? What is the practical benefit that cannot be solved easily in a different way? The equality compare operator is not defined by default, what avoids implying comparing unique identity for non-obvious structs. But there must be more?

agocke commented 7 years ago

@lachbaer

The equality compare operator is not defined by default

object.Equals?

lachbaer commented 7 years ago

@agocke operator ==

agocke commented 7 years ago

@lachbaer You mean for structs? Sure, but aren't you talking about classes? == is defined for object.

I think what @jnm2 is pointing out that you need to provide some way for classes on the stack to provide reference equality, since that is defined in the language for classes.

lachbaer commented 7 years ago

@agocke

Sure, but aren't you talking about objects No, currently not 😁 The initial motivation was to have (part of) class power with struct performance. The discussion is now whether class can be as performant as struct and what struct is good for then. The thought of having (nearly) class-like value types currently doesn't let loose on me 😉

mikedn commented 7 years ago

wtf is struct still good for besides its non-nullness and copy-by-value characteristic

A value type is useful exactly because it's a "value".

In turn that implies that it cannot be null and that it is copied by value. It also happens to be the case that local variables of value type are stored on the stack but ultimately that is just an implementation detail. They could be stored on the heap as well but that wouldn't make much sense.

But what being a "value" really means is that an array of Complex numbers is really an array of numbers and not an array of references. Neither escape analysis nor your "stack stored classes" are a substitute for value types.

lachbaer commented 7 years ago

As the purposes and benefits of struct are now named I'd like to throw in another question, based upon...

Heap stored advanced value types / Value Classes

(This is only a proof of concept class)

struct# Person {
    public int Age;
    public string Name = "";
    public string Address ="";
    public static int PersonCounter { get; private set; } = 0;
    public Person() => ++PersonCounter;
    public ~Person() => --PersonCounter;
    public virtual Person# GetCopy() => this; // Polymorphy!
}

struct# Student : Person {
    public int Semester = 1;
    public static int StudentCounter { get; private set; } = 0;
    public Student() => ++StudentCounter;
    public ~Student() => --StudentCounter;

Characteristics

Person# sibling = laura; // copies laura!

Errors
```cs
/* logical errors */
teacher = null;    // struct# is not nullable
if (teacher == laura) { }    // no default equality comparer

/* syntax errors */
Student eric;    // struct# is its own type, the `#` is mandatory
Person bobby = new Student#()    // `#` belongs on Person

Polymorphy

Person# alice = new Student();
Person# allison = alice.GetCopy();
Student# alicia = (Student#)allison;
// Student.StudentCounter now is 3

Summary

This concept is not that different from actual classes! It just takes away the reference characteristics and adds characteristics of struct that were stated previously as being the main purpose for structs. The character # was chosen, because it looks like a "box" (with sharp edges). Classical "boxing" puts value types on the heap.

Before thinking of actual implementation possibilities the question:

Would this make practical sense?

Or is it only a theoretical concept with no use?

dstarkowski commented 7 years ago

@lachbaer

In my opinion as standard programmer who has just basic knowledge of the way the CLR/JIT/AOT behaves should not be forced to chose between returning by return value or calling by reference.

You say that standard programmer shouldn't be forced to chose between struct and class. But also you suggest that the very same standard programmer should be able to chose from struct, class, heap allocated struct# and stack allocated class#?

lachbaer commented 7 years ago

@dstarkowski

You say that standard programmer shouldn't be forced to chose between struct and class

No, maybe I expressed mistakably. That statement is related to choosing between call-by-reference parameters or return values.

The difference of choosing between class, struct and struct# is without concern to the performace or memory representation, but on what you want to achieve with it semantically, copy-by-reference, copy-by-value, (non-)nullable, derivable, etc.

HaloFour commented 7 years ago

Person# sibling = laura; // copies laura!

laura is a #Student which has additional fields to #Person. Copying into the space allocated for a #Person would not be possible.

teacher = null; // struct# is not nullable

If it's heap allocated that means that it's fully GC managed. It also means that null does exist since the only thing you have in the stack itself is a pointer or reference, which can (and will) be null.

Student#[] students = new Student#[1];
Student# student0 = students[0]; // null

At best the C# compiler could try to hide null and force you to initialize where possible, but it can't prevent zero-initialization from occurring. Zero-initialization of a reference/pointer is null.

lachbaer commented 7 years ago

@HaloFour I think you didn't get the concept yet.

Think of it as struct# being actually a class, being always stored on the heap in a class-like representation. There is no fixed space allocation, like for a classical struct. That is why polymorphism is possible.

But the compiler (or CLR or whatsoever) ensures several constraints, like e.g. that the instance always exists - that goes a bit further than non-nullable reference types - and that an allocation to another variable is always a copy, as it is for struct now.

This will give you a mixture. "Heap-stored Class-like Value-Type Structs" in a way. Something that is not possible nowadays.