Closed lachbaer closed 7 years ago
And I do not see how emoticon reactions and unargumented statements like the the one above answer the quesion and lead to a serious discussion! 😞
I ask a further question, and I would be lucky to get real answers on that!
How high are the costs of creating a class object on the managed heap, instead of creating an identical struct instance on the stack?
This would most probably require a CLR change
This would certainly require a CLR change. Even C++/CLI and MC++ are unable to allocate .NET classes on the stack.
For reference:
https://github.com/dotnet/roslyn/issues/2104 https://github.com/dotnet/coreclr/issues/1784 http://xoofx.com/blog/2015/10/08/stackalloc-for-class-with-roslyn-and-coreclr/
@HaloFour Thanks for the valuable links 😃
On my processor the runtime difference of an optimized compilation between allocating a class (without any field initializations) and an identical struct is between 10:1 and 20:1.
ClassBar firstBar = new ClassBar();
ClassBar #secondBar = new ClassBar();
When the compiler reaches the second new ClassBar()
it must "look back" for what memory type it should create the object. Also you can accidentially omit or delete the #
and get a different result than wanted.
Therefore I think struct ClassBar()
is better in terms of clearness (also for the compiler) and stability.
I don't see your point on nullables. Nullable<T>
is a value type on the stack and can be null
. #
marked variables must never and cannot be null.
Stack allocation is certainly much faster. But it comes with a lot of limitations, especially if the C# compiler is to attempt to enforce safe memory management. I think that the cases that this feature could be applied would be incredibly limited as a result. You couldn't take an arbitrary class and just "pin" it to the stack. The class would likely have to meet a very strict set of guidelines that would have to be enforced by the CLR. For example the class might have to be sealed
to prevent inheritance which would prevent the compiler/JIT from ensuring that enough space is allocated on the stack.
Then you have the entire problem of escape analysis. That reference can't go anywhere since it points to a portion of memory that is unmanaged. I'm not entirely sure how that can be adequately handled since effectively calling any instance method on the reference is a form of escaping since those methods could save this
literally anywhere. The blog post above gets into this a little bit as this was already brought up as one of the major hurdles.
@lachbaer Objects on the heap are freed by the GC, which ensures that the object survives until after the point when its last reference is no longer reachable. You have not explained to us how you would accomplish that for objects that are on the stack, or what the behavior should be when there are dangling references. So we cannot answer how expensive it would be.
@gafter
Objects on the stack should behave like struct
s, in a way.
What I mean is, that once the stack is cleared, the object is immediately finalized.
To ensure that no dangling references occur, the stack object gets a new type name suffixed with a #
. The variable is handled like a struct
from that point on, i.e. there can be no further references to it. The instance on the stack is a shallow copy of the heap object (= struct(objVariable)
) or a newly created object (= struct Object()
).
Also, an "object# obj = struct Object();" is by default converted to a sealed
type. It must not be cast to any other type in the hierarchy. The moment that it is cast to (object)
or an explicit interface type it is "boxed" again, like any value type, and by that regains all its powers.
The object on the stack is not a reference, it is a copy. It's like int? a = b
with int b
, where int?
is a new type, but now with the type modifier #
. Boxing by assigning the stack object to an object obj
again makes a copy makes of the stack object on the heap and keeps the stack object on the stack until being out of scope.
A valid issue is when the finalizer frees resources, that are pointed to by reference fields of the object. There can be dependencies between the objects that could lead to undecicive behaviour.
@lachbaer
The variable is handled like a struct from that point on, i.e. there can be no further references to it.
You have to be able to take references to the copy on the stack in order for instance methods to work.
@HaloFour Seems that I have a lack of knowledge here. What is the difference on how instance methods of structs work compared with reference types? You can use this
in value type "instance" methods. ❓
@lachbaer
In the case of struct instance methods the compiler pushes a ref
to the struct onto the stack. That's why struct instance methods can be self-mutating.
@HaloFour But that means that the ref
to itself lies deeper (physically) in the stack and can only exists when the instance itself exists. The ref
will always be purged before the actual object. Then this is no technical issue and you just get confused by the statement "there can be no further references to it"? Or do I oversee something?
To ensure that no dangling references occur, the stack object gets a new type name suffixed with a #
So you're suggesting that the C# compiler should create a struct type and copy the reference type's fields and methods?
@mikedn I think that there must be a CLR change for this and that the CLR does this in the background. The compiler creates the type only as a TypeSymbol.
@lachbaer What would happen in the following code?
class C
{
private static C storage;
private int state;
public int M()
{
storage = this;
storage.state = 42;
return this.state;
}
}
…
C c1 = new C();
Console.WriteLine(c1.M());
C# c2 = struct C();
Console.WriteLine(c2.M());
As far as I can tell, there are two options:
C#.M
is the same method as C.M
. When you call it, a reference to the C
stored on the stack is passed in, resulting in memory corruption down the line.C#.M
is a copy of C.M
, but with semantics changed to be safe (e.g. by copying the contents of this
when passed elsewhere). I don't see how could this be done without changing the semantics so much that huge number of methods (like C.M
) become broken.What am I missing? Is there some third option? Or do you think one of the two options above is acceptable?
What's more the IL for both the struct
and the class
implementation of such behavior is different, both in the implementation and the consumption. It wouldn't be possible to treat a class like a struct with the same IL generated for that class.
Objects on the stack should behave like structs, in a way
Is it only in terms of memory or are classes allocated on stack passed to a method by value as well? Can parameter of type ClassBar#
be passed to method that accepts ClassBar
?
If not you won't be able to use it with any existing code. Otherwise you need to get reference, which you stated shouldn't be possible.
@HaloFour
It wouldn't be possible to treat a class like a struct with the same IL generated for that class
Can you submit an example? You are probably right, but I haven't seen solvable differences in the IL. A CLR change must be done for this anyhow.
@svick
What would happen in the following code?
I admit not to think of self-references. 😞 That will crash whatever way I think. But I weren't me, if I hadn't had a solution in my mind 😀 :
The purpose of this construct in the first place is to allow a relevant performance boost for lightweight classes, who are classes and not structs for some reason. Most of those classes I can think of are part of my projects, so I have access to them, or will probably be lightweight classes of the framework, that could be rededicated easily.
If backward compatibility is no issue - because until now we dealt with the current situation somehow - a new "kind" of class
can be introduced to the language and CLR, namely class#
.
public class# Person { ... }
public class# Student : Person { ... }
Some assurances must be made about this classes, two being
class#
'esthis
is only allowed on other T#
declarations.This means, that not per se all classes can be put on the stack, but we would have the opportunity to create new classes for which this feature would actually (and only) make sense. (Besides we get the benefit of an implicit "Dispose" by a deterministic called finalizer.)
@dstarkowski
Can parameter of type
ClassBar#
be passed to method that acceptsClassBar
Yes, it will be shallow copied on the heap, loosing its performance benefit. The method then operates on a copy of the object! But there is a solution to that, too:
void RenameStudent(Student stud) { stud.Name += " the douche"; }
Student# student = new Student#("Eric");
RenameStudent(student ref);
After RenameStudent
exits the heaped student
is copied back to the stacked student. That is indicated by appending the ref
keyword to the argument.
a new "kind" of class can be introduced to the language and CLR, namely class#
Not "can", "must". There is no other way to ensure that a reference type allocated on stack doesn't use this
in an unsafe manner.
Yes, it will be shallow copied on the heap, loosing its performance benefit. The method then operates on a copy of the object! But there is a solution to that, too:
So you're proposing that significant changes are made to the runtime and the language in the name of performance but it all falls flat on its face as soon as the usage becomes slightly more complicated. It's probably cheaper and more effective to let the runtime do escape analysis and stack allocate reference types that do not escape the stack.
When looking at all the issues I have posted upon recently - of course including this one 😁 - it kind of seems to me that we/I treat the symptom, not the cause.
With acceptable CLR changes being unavoidable at least thinking about a completely new native type, not being backwards compatible, is acceptable. I'll call it cluct
or strass
😆 No, seriously, this time I go with struct#
to facilitate keyword reuse.
It will bring several things together
object
conversions)struct?
(implies struct#?
)That struct#
had basically quite a lot of the classical struct
in essence, but loosens many if not all constraints that made me and other participants propose this or that approach to solve a current issue.
Addendum: to facilitate some of the characteristics of that new type, an additional, specialized heap with less CLR overhead can be created in memory. The stack then simply stores the pointer.
Sounds like a massive amount of work for very little real benefit.
@HaloFour The primary benefit lies in performance. In very many cases you operate on midweight objects that are already too heavy for being structs but don't make much or any use of comprehensive class features. Nevertheless you finally decide for class
es.
The CLR is initially designed to support the broad OOP feature of classes, but the costs for that managed behaviour is massive. Performance measures underline that and for some this is the contra-argument to not go with .NET, Java or alike.
With this, admittedly big, addition to the CLR many often used (custom) classes nearly get the performance of struct
s while still maintaining the most used comfort of class
es. The performace jump will be groundbreaking, towards a Pro for .NET into the next decade(s). And also a possible beater against the JVM for .NETCore.
@lachbaer
In very many cases you operate on midweight objects that are already too heavy for being structs
What makes object too big for being a struct, but not too big for stack allocated class?
The primary benefit lies in performance
And as already stated there are alternatives to this proposal that are likely easier to implement.
@lachbaer
That's why we have ref
locals and returns now. The size of the struct no longer matters, you don't have to copy it around to work with it.
I continue to think when I read these genre of issues (stackalloc, destructible types, this "strass") that there is fundamental fallacy: C# is NOT C/C++ and I don't want to have to preoccupy of the memory management!
This is should be the work of the Jitter / AOT compiler doing the correct escape analysis and then allocating objects on the stack only when it is safe to do; the objection that is impossible does not make sense: Java has this from years and the JVM and the CLR are really a lot similar!
Stack analysis will give this all for free:
In the end the boundary between ValueTypes (technically an "hack") and Objects will become indeed totally nonexistent an hypothetical new .Net could not have this distinction and be a pure object oriented language (as Java would wanted to be but then they feel the necessity of "primitive types" and in some way broken it) in which Integer is an Object but 99% of the times is allocated on the stack as if it was a "native" int.
cluct or strass
Please stop with the portmanteaus, lol...
In the end the boundary between ValueTypes (technically an "hack") and Objects will become indeed totally nonexistent
Not really. For one thing escape analysis is limited in what it can do (e.g. it's very difficult to allocate on the stack objects that are returned). And more importantly, being a value type and being allocate on the stack are independent things.
an hypothetical new .Net could not have this distinction and be a pure object oriented language
Value types have nothing to do with C# being or not being a "pure" OOP language. Not to mention that the idea of a "pure" OOP language is archaic.
@mikedn
And as already stated there are alternatives to this proposal that are likely easier to implement.
Can you link them (again), please? 😃
@lachbaer Quoting myself from couple of posts above:
It's probably cheaper and more effective to let the runtime do escape analysis and stack allocate reference types that do not escape the stack.
@mikedn Ah, you ment @HaloFour 's links? dotnet/roslyn#2104 dotnet/coreclr#1784 http://xoofx.com/blog/2015/10/08/stackalloc-for-class-with-roslyn-and-coreclr/
Those are good too. coreclr#1784 is an add "escape analysis" request for the JIT.
This gets me confused a bit 😕 It seems as if the boundaries between classes and structs will perhaps vanish with on of the next CLR updates.
Does that mean that the only left real argument for deciding between struct
and class
is whether my object shall be by value or by reference for the sake of "copyness"?
It seems as if the boundaries between classes and structs will perhaps vanish with on of the next CLR updates.
Nope, that's not true.
What escape analysis does is allocate reference types on the stack when the compiler discovers that reference(s) do not escape the stack. That is:
As an example - it should be possible to allocate the List<int>
object on the stack in the below example:
void foo() {
var list = new List<int>();
list.Add(42);
PrintListCount(list);
}
void PrintListCount(List<int> list) {
Console.WriteLine(list.Count);
}
This is, of course, subject to JIT compiler's escape analysis capabilities which may be rather limited. But Java does some of this so we know that it's possible, to an extent.
Does that mean that the only left real argument for deciding between struct and class is whether my object shall be by value or by reference for the sake of "copyness"?
Not really. For example neither your proposal nor escape analysis can deal with returned objects, at least not in a reasonable manner. Because of that people will likely still use struct enumerators like List<T>.Enumerator
.
Well but escape analysis could in some case be helped by another compiler optimization: https://en.wikipedia.org/wiki/Return_value_optimization
For example this List does escape and so should be allocated on the heap:
void foo() {
var list = new List<int>();
list.Add(42);
return list;
}
void test() {
List l = var foo();
PrintListCount(l);
}
But after RVO not anymore as it becomes:
void foo(ref List list<int>) {
list = new List<int>();
list.Add(42);
//return list; ==>elided!
}
void test() {
List l;
var foo(ref l);
PrintListCount(l);
}
Usually the optimization are chained for better results, I think that in reality the occasions in which allocate on the stack should be more than expected.
But after RVO not anymore as it becomes:
Today ref List<int>
is a reference to a reference so the code you show doesn't actually allow you to allocate the list on the stack. The allocation still needs to be done inside foo
and storing the resulting reference in list
means escaping it so no stack allocation is actually possible.
What you probably have in mind is that List<int>
behaves like a struct in this case and that the ref
points to the actual storage, storage that's on the caller's frame. There are all sorts of problems with this, for example foo
needs to behave differently depending on whether the caller escapes or not the list. This "behaves differently" isn't likely doable so you'll have to generate 2 different versions of foo
, not pretty.
This gets me confused a bit 😕 It seems as if the boundaries between classes and structs will perhaps vanish with on of the next CLR updates.
Does that mean that the only left real argument for deciding between struct and class is whether my object shall be by value or by reference for the sake of "copyness"?
That sounds great to me - well there's the fact that struct
cannot have empty c'tors. 😛
Seriously, if classes can be stack allocated then we need a borrow operator, which means there would be a lot of existing API that would be unusable by stack allocated objects.
In my opinion as standard programmer who has just basic knowledge of the way the CLR/JIT/AOT behaves should not be forced to chose between returning by return value or calling by reference. Also, in my eyes this already is - to cite @fanoI - "to preoccupy of the memory management". You're gonna choose one pattern over the other. Besides using a ref
for only one return value is stated as what shouldn't be done in every (beginners) programming book. And with ref
-returns and ValueTuple
s by our hands now, that is even emphasized and true for multi return values.
But the motive for this topic is to allow "standard" programmers to easily chose between OOP power by using class
or to chose performance power by using class#
with restricted OOP possibilities, that nevertheless offer more programming comfort than struct
does today.
This is no proposal and I don't actually care about how this could be achieved, nor whether there is a real need for a change.
The latest comments just tell me, that the CLR team is putting effort in making classes more effective towards structs. Well, then there is actually no need to put classes on the stack anymore.
The initiative for this discussion comes from #99. It turns out that initializing struct
s and giving them a custom default
is actually not doable, unless a solution is found. My first thought on this was "what if classes can be put on the stack with the same performance?", hence this discussion.
Now however to me, the cat is chaising its tail.
struct
does not have the slightest comfort of class
class
is not really predictable concerning its performancestruct
still good for besides its non-nullness and copy-by-value characteristic?)Any ideas about how or if?
@lachbaer you keep asking this question: "wtf is struct still good for?"
imo there should be no difference between class
and struct
beside the defaults of by-value vs by-reference.
@lachbaer Struct is a bag of values contained by some class or variable with no concept of unique identity. Class has concept of unique identity.
@jnm2 Isn't that a theoretical POV? What is the practical benefit that cannot be solved easily in a different way? The equality compare operator is not defined by default, what avoids implying comparing unique identity for non-obvious structs. But there must be more?
@lachbaer
The equality compare operator is not defined by default
object.Equals
?
@agocke operator ==
@lachbaer You mean for structs? Sure, but aren't you talking about classes? ==
is defined for object
.
I think what @jnm2 is pointing out that you need to provide some way for classes on the stack to provide reference equality, since that is defined in the language for classes.
@agocke
Sure, but aren't you talking about objects No, currently not 😁 The initial motivation was to have (part of)
class
power withstruct
performance. The discussion is now whetherclass
can be as performant asstruct
and whatstruct
is good for then. The thought of having (nearly) class-like value types currently doesn't let loose on me 😉
wtf is struct still good for besides its non-nullness and copy-by-value characteristic
A value type is useful exactly because it's a "value".
In turn that implies that it cannot be null and that it is copied by value. It also happens to be the case that local variables of value type are stored on the stack but ultimately that is just an implementation detail. They could be stored on the heap as well but that wouldn't make much sense.
But what being a "value" really means is that an array of Complex
numbers is really an array of numbers and not an array of references. Neither escape analysis nor your "stack stored classes" are a substitute for value types.
As the purposes and benefits of struct
are now named I'd like to throw in another question, based upon...
(This is only a proof of concept class)
struct# Person {
public int Age;
public string Name = "";
public string Address ="";
public static int PersonCounter { get; private set; } = 0;
public Person() => ++PersonCounter;
public ~Person() => --PersonCounter;
public virtual Person# GetCopy() => this; // Polymorphy!
}
struct# Student : Person {
public int Semester = 1;
public static int StudentCounter { get; private set; } = 0;
public Student() => ++StudentCounter;
public ~Student() => --StudentCounter;
class
equivalent definition - actually they are classesstruct#
and interfaces onlythis
behaves like in struct
operator ==
defined
Construction
Student# peter;
Student# laura = new Student() { Name = "Laura", Semester = 3 };
Person# teacher = new Student();
Person# sibling = laura; // copies laura!
Errors
```cs
/* logical errors */
teacher = null; // struct# is not nullable
if (teacher == laura) { } // no default equality comparer
/* syntax errors */
Student eric; // struct# is its own type, the `#` is mandatory
Person bobby = new Student#() // `#` belongs on Person
Polymorphy
Person# alice = new Student();
Person# allison = alice.GetCopy();
Student# alicia = (Student#)allison;
// Student.StudentCounter now is 3
This concept is not that different from actual classes! It just takes away the reference characteristics and adds characteristics of struct
that were stated previously as being the main purpose for structs.
The character #
was chosen, because it looks like a "box" (with sharp edges). Classical "boxing" puts value types on the heap.
Before thinking of actual implementation possibilities the question:
Or is it only a theoretical concept with no use?
@lachbaer
In my opinion as standard programmer who has just basic knowledge of the way the CLR/JIT/AOT behaves should not be forced to chose between returning by return value or calling by reference.
You say that standard programmer shouldn't be forced to chose between struct
and class
. But also you suggest that the very same standard programmer should be able to chose from struct
, class
, heap allocated struct#
and stack allocated class#
?
@dstarkowski
You say that standard programmer shouldn't be forced to chose between struct and class
No, maybe I expressed mistakably. That statement is related to choosing between call-by-reference parameters or return values.
The difference of choosing between class
, struct
and struct#
is without concern to the performace or memory representation, but on what you want to achieve with it semantically, copy-by-reference, copy-by-value, (non-)nullable, derivable, etc.
Person# sibling = laura; // copies laura!
laura
is a #Student
which has additional fields to #Person
. Copying into the space allocated for a #Person
would not be possible.
teacher = null; // struct# is not nullable
If it's heap allocated that means that it's fully GC managed. It also means that null
does exist since the only thing you have in the stack itself is a pointer or reference, which can (and will) be null
.
Student#[] students = new Student#[1];
Student# student0 = students[0]; // null
At best the C# compiler could try to hide null
and force you to initialize where possible, but it can't prevent zero-initialization from occurring. Zero-initialization of a reference/pointer is null
.
@HaloFour I think you didn't get the concept yet.
Think of it as struct#
being actually a class
, being always stored on the heap in a class
-like representation. There is no fixed space allocation, like for a classical struct
. That is why polymorphism is possible.
But the compiler (or CLR or whatsoever) ensures several constraints, like e.g. that the instance always exists - that goes a bit further than non-nullable reference types - and that an allocation to another variable is always a copy, as it is for struct
now.
This will give you a mixture. "Heap-stored Class-like Value-Type Structs" in a way. Something that is not possible nowadays.
For heap stored "value classes" please see https://github.com/dotnet/csharplang/issues/460#issuecomment-296102185
Question
Why is it not possible to put classes on the stack? Value type
struct
s do not offer the comfort of classe swhen it comes to inheritance.In C this is possible by ommiting the
new
keyword.I can think of the reason, that accidentially ommiting
new
and creating a stack stored instance without any further notice can lead to unwanted behaviour. Also when creating the (first) CLR, strictly distinguishing between stack storedstruct
and heap stored and garbage collectedclass
might be easier.Motivation
There is ongoing development with (non-)nullable reference types, et alta. It seems that the borders between value types (
struct
) and reference types (class
) partly obliberate nowadays.I recently had scenarios where putting a
class
type on the stack to accelerate processing would be quite useful.Possible syntax
To create a class instance on the stack the following statement could be used, where
new
is replaced bystruct
.ClassBar#
marks innerBar as being not a pointer.To (flat) copy a heap stored instace to the stack, a
struct()
operator can be introducedImplementation possibilities
Alternatives
Maybe the practical performance impacts nowadays aren't as heavy, so that leaving classes on the managed heap is actually no issue any more.