Closed jonathanvdc closed 7 years ago
Weird. IIRC, newobj
can be used to allocate a struct and push it on the stack but that's not the usual way that is done and I don't believe that I've ever seen the C# compiler do that. The portion of the spec that you quoted seems to fit with the behavior that the compiler is exhibiting, namely that it loads the address of a local onto the stack and then invokes the constructer directly. The verbiage is a little confusing since it mentions allocation, but that wouldn't be a separate step with newobj
.
Assuming the above behavior is deemed inappropriate what would the compiler be expected to emit and how would the compiler detect this kind of situation?
The spec specifically states that (emphasis mine)
An instance of type
T
is created by allocating a temporary local variable
And clearly that's not happening: the last new
expression loads the address of an existing, non-temporary local variable onto the stack. As far as I can tell, this is purely an optimization: the code generator will only replace newobj
/stloc
with ldloca
/call
if both of the following are true:
this
pointer doesn't alias with any other reachable pointer (but fails to). try
block, or it is defined immediately inside the innermost try
block. This requirement makes sure that a local does not end up in a half-constructed state because an exception was thrown in the constructor.I would expect the constructor call to be compiled as if one of the above requirements had not been satisfied. That is, I expect the Main
method above to produce the same results as
public static void Main(string[] Args)
{
Point pt1 = default(Point), pt2 = default(Point);
try
{
pt1 = new Point(1, 2);
pt2 = new Point(ref pt1);
pt1 = new Point(ref pt1);
}
catch (Exception)
{ }
Console.WriteLine(pt1);
Console.WriteLine(pt2);
}
Which outputs
(1, 2)
(1, 2)
Wrapping code in a try
block shouldn't have this kind of observable difference if no exceptions are thrown, right?
Interestingly, mcs
still produces a program that prints
(0, 0)
(1, 2)
I think this may be dangerous if new Point
throws an exception, but that's kind of off-topic here.
The disassembly for the Main
method I just wrote and compiled with csc
is
.method public hidebysig static
void Main (
string[] Args
) cil managed
{
// Method begins at RVA 0x20e0
// Code size 75 (0x4b)
.maxstack 2
.locals init (
[0] valuetype Point,
[1] valuetype Point
)
IL_0000: nop // Do nothing (No operation)
IL_0001: ldloca.s 0 // Load address of local variable with index indx, short form
IL_0003: initobj Point // Initialize the value at address dest
IL_0009: ldloca.s 1 // Load address of local variable with index indx, short form
IL_000b: initobj Point // Initialize the value at address dest
IL_0011: nop // Do nothing (No operation)
IL_0012: ldc.i4.1 // Push 1 onto the stack as int32
IL_0013: ldc.i4.2 // Push 2 onto the stack as int32
IL_0014: newobj instance void Point::.ctor(int32, int32) // Allocate an uninitialized object or value type and call ctor
IL_0019: stloc.0 // Pop a value from stack into local variable 0
IL_001a: ldloca.s 0 // Load address of local variable with index indx, short form
IL_001c: newobj instance void Point::.ctor(valuetype Point&) // Allocate an uninitialized object or value type and call ctor
IL_0021: stloc.1 // Pop a value from stack into local variable 1
IL_0022: ldloca.s 0 // Load address of local variable with index indx, short form
IL_0024: newobj instance void Point::.ctor(valuetype Point&) // Allocate an uninitialized object or value type and call ctor
IL_0029: stloc.0 // Pop a value from stack into local variable 0
IL_002a: nop // Do nothing (No operation)
IL_002b: leave.s IL_0032 // Exit a protected region of code, short form
IL_002d: pop // Pop value from the stack
IL_002e: nop // Do nothing (No operation)
IL_002f: nop // Do nothing (No operation)
IL_0030: leave.s IL_0032 // Exit a protected region of code, short form
IL_0032: ldloc.0 // Load local variable 0 onto stack
IL_0033: box Point // Convert a boxable value to its boxed form
IL_0038: call void [mscorlib]System.Console::WriteLine(object) // Call method indicated on the stack with arguments
IL_003d: nop // Do nothing (No operation)
IL_003e: ldloc.1 // Load local variable 1 onto stack
IL_003f: box Point // Convert a boxable value to its boxed form
IL_0044: call void [mscorlib]System.Console::WriteLine(object) // Call method indicated on the stack with arguments
IL_0049: nop // Do nothing (No operation)
IL_004a: ret // Return from method, possibly with a value
Try IL_0011-IL_002d Catch class [mscorlib]System.Exception IL_002d-IL_0032
} // end of method Program::Main
So, to answer your question, I propose that the compiler emit the IL below for the final assignment to pt1
in my original Main
method.
newobj instance void Point::.ctor(valuetype Point&)
stloc.0
Detecting this sort of situation is hard – which is, I'm assuming, why PartialCtorResultCannotEscape
fails to return false
in this case. I can see three ways to get around this problem:
Always use newobj
/stloc
. This will always produce the result I'd expect, but it's not as efficient as a simple ldloca
/call
.
Iterate over all parameters to the constructor before trying to apply the optimization, and don't apply the optimization if one of them is a ref T
, where T
is the type that's being constructed.
However, I'm fairly sure this will still break unsafe
code. For example, the constructor below will still be invoked inappropriately if the third new-expression in my original program is changed to pt1 = new Point(&pt1);
.
public unsafe Point(Point* Other)
{
this = default(Point);
this.X = Other->X;
this.Y = Other->Y;
}
I suppose the code generator could also try to check if a T*
parameter is in the parameter list, but that will still break if someone uses pointer casts to obscure that the pointer actually points to a T
.
Analyze the constructor to find out if it loads values from a pointer/reference to T
, and only emit a call
if that's not the case. This is strictly speaking an inter-procedural analysis, and I'm not entirely sure how good Roslyn is at those right now. So this solution may be the hardest one to implement, but it will likely net you the best results.
Hi again. I have implemented option three in my own compiler, and I thought you might be interested in how I decide when to use ldloca
/call
instead of newobj
/stloc
. (If you're not, then that's fine too!)
Let me start with this little disclaimer: I think my approach will always produce correct results, but haven't proven that this is actually the case. If you spot any flaws in my reasoning, then please don't hesitate to point them out – in fact I'd be very grateful if you did.
Also, I'll pretend like this
is a pointer for the remainder of this comment. In C# it's not, but it is in IL, so I don't think that's all that unfair of me.
newobj
--> call
transformation safe?My interpretation of the spirit of the spec (yes, I know that's incredibly vague, but bear with here :) ) is that it wants to make sure that every new-object expression initializes a unique region of storage, i.e., when the constructor is run, there is no pointer other than this
that points to *this
. In other words: the this
pointer cannot alias any other pointers.
Now, that's a fairly strict constraint. And clearly using call
to initialize a struct
breaks that rule, because you need to have a pointer to the region of storage you'd like to initialize before you can pass it as an argument to the constructor. But what PartialCtorResultCannotEscape
does – that is, it checks if TargetIsNotOnHeap
– is actually pretty devious. It basically boils down to stating that: "if the constructor cannot access any pointers that point to *this
but are not the this
pointer itself, then it is as if a unique region of storage is initialized."
Checking TargetIsNotOnHeap
is admittedly a flawed way of verifying whether this property holds for a given new-object expression but I do believe that the general idea of performing the newobj
--> call
transformation if and only if no pointer that does not derive from this
may alias the this
pointer is a good starting point.
When I say that a pointer derives from another pointer, I mean that either that pointer, or the result of applying field access to a derived pointer. So &a.b.c
derives from both &a.b
and &a
.
To recap: suppose that the following statements are written in the context of a constructor:
// `this.x` (partially) aliases `this`, but is derived from `this`.
this.x = 1; // Does not preclude newobj --> call.
// `y` may point to anything, _including_ some part of `this`.
// It may (partially) alias `this`, and is not derived from `this`.
this.x = *y; // Precludes newobj --> call
I decided to invert the problem. Instead of checking if the newobj
--> call
transformation is safe at the call site, I considered it to be more appropriate to check if the transformation is safe at the constructor definition site.
Specifically, I walk through all statements and expressions in the constructor, and check if they may (partially) alias this
without being derived from this
. If that is the case, then I mark the constructor as ineligible for the newobj
--> call
transformation.
Function calls and new-object expressions in the constructor are a special case. Unfortunately, they're also pretty common: accessing a property amounts to a function call. So we can't just bail out if we encounter one.
First off, if I don't have a function/constructor's source code, then I do bail out, since it's the only sensible thing to do. I mark the constructor that calls it as ineligible for the newobj
--> call
transformation.
But if I do have the source code, then I try to find all pointers derived from the this
pointer as either the this
argument of the call, or as (pointer) argument to the call/object instantiation. If if find any, then I apply the analysis from the previous section to the callee, but substitute appropriate parameter(s) to said function for the this
pointer.
Full disclosure: this is a slight simplification. I actually use two sets of pointer expressions: a root set and an alias set. That allows me to check if the callee does not prohibit the newobj
--> call
expression even when I can't fully map my root set, i.e., the this
pointer, to function parameters. But I think the root and alias sets make understanding how my algorithm works much harder and don't add much conceptual value.
I think so. My results seem to indicate so.
You're welcome to browse my implementation of the analysis that I just tried to describe on my GitHub. (Sorry, no syntax highlighting.)
One nice byproduct of my algorithm is that it allows me to go beyond what C# compilers have done in the past:
var xs = new Vector2[1];
xs[0] = new Vector2(1, 2);
gets compiled to
IL_005b: ldc.i4.1
IL_005c: newarr Vector2
IL_0061: stloc.3
IL_0062: ldloc.3
IL_0063: ldc.i4.0
IL_0064: ldelema Vector2
IL_0069: ldc.r8 5.
IL_0072: ldc.r8 4.
IL_007b: call instance void valuetype Vector2::'.ctor'(float64, float64)
No newobj
. I'm pretty happy about that. :)
I think we should suppress the in-place optimization if the ctor takes anything by reference - ref/out/_arglist/TypedReference etc..
Otherwise the constructor can do indirect assignments that would observably break the create-then-assign semantics.
It does not seem to be worth doing alias tracking. Struct ctors taking refs are not common, perf penalty is not huge. Besides with addition of ref locals the tracking could be fairly hard to do. It woudl be easier to only allow the optimization when all parameters are by-value.
Right, but wouldn't that break unsafe code? Consider the following example.
using System;
public struct Point
{
public Point(int X, int Y)
{
this = default(Point);
this.X = X;
this.Y = Y;
}
public unsafe Point(int X)
{
this = default(Point);
this.X = X;
this.Y = Program.ptr->Y;
}
public int X { get; private set; }
public int Y { get; private set; }
public override string ToString()
{
return "(" + X + ", " + Y + ")";
}
}
public static unsafe class Program
{
public static Point* ptr;
public static void Main(string[] Args)
{
Point vec;
vec = new Point(1, 2);
ptr = &vec;
vec = new Point(1);
Console.WriteLine(vec.X);
Console.WriteLine(vec.Y);
}
}
It's basically a copy. I expect this to print 1
and then 2
, but it'll print 1
and then 0
if the second assignment is optimized to a call
.
But the parameter list for Point(int)
doesn't include any pointers at all, so there's just no way to tell if the newobj
--> call
optimization is legal just by looking at the parameter list.
@jonathanvdc there are many safeguards in the language that unsafe code can get around. It disrespects scoping and lifetimes of variables. I.E. - you can get a ptr to a local and use that outside of the scope of that local or even its method. The code might still work depending on how locals/stack are implemented, but it would be very implementation dependent.
So, no - Ability of pointers to expose optimizations and implementation details is not a big concern.
Thanks for your reply!
However, I don't quite see what pertinence variable lifetimes have here – my example program does not try to use any storage locations beyond their point of deallocation, regardless of the implementation.
I do think the notion of undefined behavior from C/C++ might be useful here; I would agree with you that it is not entirely unreasonable to optimize based on undefined behavior. For example, if I had used a pointer whose pointee had already been deallocated, then it wouldn't have been incorrect for the compiler to optimize my code to something that does an entirely different thing from what I intended it to – its behavior was undefined, after all.
On the other hand, there is absolutely nothing that leads me to believe that an externally-provided pointer might alias the this
pointer in a constructor, especially since the spec specifically states that the this
pointer always points to a temporary that was specifically allocated for the constructor call. So there cannot, by definition, be any pointers that (observably) alias this
when the constructor is called.
Additionally, I have no knowledge of any rule in C# that would make using a valid pointer in a constructor undefined behavior. In fact, I would even go as far as to say that I don't believe that my program contains any undefined behavior.
So the optimization you propose would – at least for my trivial example program – break the promise made by the spec that calling a constructor through a new-object expression will result in a unique this
pointer, even in the absence of undefined behavior. This is not an aggressive optimization or an implementation detail; it's an incorrect optimization which breaks both the spec and people's intuition of how the C# language works.
If you were to transliterate my example's source code to C++ and compile it using a relatively standards-compliant compiler such as gcc
or clang
, then that'd yield you an executable with the expected behavior. This situation is not unique to C#, but somehow the incorrect optimization seemingly is. Frankly, I think that's just embarrassing to all of us.
In conclusion: I implore you to either do the newobj
--> call
optimization properly or not do it at all. There's little point in piling yet another layer of heuristics on top of a fundamentally broken optimization technique; that's just a waste of time. If you don't want to go through the pain of analyzing constructor bodies – and believe me, I can sympathize with that – then please do consider just dropping the optimization altogether. Ask yourself this: does the additional copy that a newobj
opcode incurs over a call
opcode truly have an observable effect on an average program? The only use-case that I know of where the newobj
--> call
optimization might truly be useful is high-performance computing. You don't seem to be moving in that direction with the C# programming language – at least not for now. So who exactly stands to gain from having this broken optimization anyway? Surely trading correctness for performance – which is exactly what this optimization does – is a no-go, especially in C#?
@jonathanvdc - variable lifetimes is just an example of a language concept that can be completely disrespected in unsafe code. Such violations expose underlying implementation, can result in behavior sensitive to optimizations and can go against the intuition of how C# language works.
Preventing that, when achieved via unsafe code, is not a goal.
The part of the bug that can be reproduced without resorting to unsafe code/pointers/reflection (to the scope described in #16364) is actionable. The rest is not.
@VSadov Hello again! Thanks for your reply!
I'm not entirely convinced by your argument that it's okay for unsafe code to expose arbitrary implementation details. I'm not aware of any statements in the example program I provided that violate the rules of the C# language – if you've spotted any, please do point them out! Short of that, it seems like you're saying that all unsafe code is undefined behavior. I'm fairly sure that's not what the spec says.
I understand that it's a burden to implement and maintain the newobj
--> call
transformation in a way that does not break the spec for unsafe code. But it's not impossible: I already have a proof-of-concept implementation in my compiler.
Regardless, I think the real question we should be asking ourselves is: "what are the run-time performance implications of the newobj
--> call
optimization?"
To that end, I've measured the performance of a benchmark with and without the optimization. I took the fractal benchmark from the ecsc
test suite (which is actually based on Kevin Frei's fractal benchmark) and tweaked it to print the total time. I also upped its resolution so it'd take longer to compute a result. My modified version can be found here.
I included my input and output below. I used my own compiler in -O2
mode, which produces IL that is similar to csc /optimize+ /debug-
. The -fno-optimize-new-struct
turns off the newobj
--> call
optimization. (I didn't use csc
because it doesn't have a switch to turn off only the newobj
--> call
optimization and nothing else – sorry!)
$ mono --version
Mono JIT compiler version 4.8.0 (Stable 4.8.0.495/e4a3cf3 Wed Feb 22 18:30:58 UTC 2017)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
TLS: __thread
SIGSEGV: altstack
Notifications: epoll
Architecture: amd64
Disabled: none
Misc: softdebug
LLVM: supported, not enabled.
GC: sgen
$ ecsc Fractal.cs -platform clr -O2
$ for i in {1..10}; do mono bin/Fractal.exe; done
Mandelbrot:27337929 Julia:25753505 takes 798.667 ms
Mandelbrot:27337929 Julia:25753505 takes 793.24 ms
Mandelbrot:27337929 Julia:25753505 takes 796.536 ms
Mandelbrot:27337929 Julia:25753505 takes 792.53 ms
Mandelbrot:27337929 Julia:25753505 takes 794.116 ms
Mandelbrot:27337929 Julia:25753505 takes 798.246 ms
Mandelbrot:27337929 Julia:25753505 takes 797.948 ms
Mandelbrot:27337929 Julia:25753505 takes 796.128 ms
Mandelbrot:27337929 Julia:25753505 takes 792.201 ms
Mandelbrot:27337929 Julia:25753505 takes 792.807 ms
$ ecsc Fractal.cs -platform clr -O2 -fno-optimize-new-struct
$ for i in {1..10}; do mono bin/Fractal.exe; done
Mandelbrot:27337929 Julia:25753505 takes 781.789 ms
Mandelbrot:27337929 Julia:25753505 takes 784.175 ms
Mandelbrot:27337929 Julia:25753505 takes 780.436 ms
Mandelbrot:27337929 Julia:25753505 takes 776.888 ms
Mandelbrot:27337929 Julia:25753505 takes 775.225 ms
Mandelbrot:27337929 Julia:25753505 takes 777.721 ms
Mandelbrot:27337929 Julia:25753505 takes 781.725 ms
Mandelbrot:27337929 Julia:25753505 takes 777.09 ms
Mandelbrot:27337929 Julia:25753505 takes 783.697 ms
Mandelbrot:27337929 Julia:25753505 takes 778.738 ms
There's only a slight difference in runtimes, but I find it striking that the optimized version is actually slower. I don't really know why that is. Perhaps the JIT does a better job at optimizing the code without the newobj
--> call
transformation.
But my point is that this kind of evidence is not exactly in favor of the newobj
--> call
transformation. In fact, I've never seen any evidence that newobj
--> call
resulted in a significant performance improvement in any reasonably-sized program. And this relatively struct
-heavy benchmark shows that newobj
--> call
can actually diminish the runtime performance of a compiled program.
So here's my suggestion: just drop this optimization and emit newobj
whenever new T(...)
appears. Leave this optimization to the JIT. That'll solve all of your corner cases: always emitting newobj
will work for safe and unsafe code, it'll honor the spec and respect people's intuition of what new
does. I'm not aware of any evidence that this would degrade performance in real-life applications (I wouldn't suggest dropping the optimization otherwise) and you couldn't wish for a lower-maintenance solution. What's not to love?
@VSadov I moved this to 15.6 to evaluate if it is fixed by #20087. I am evaluating that now.
@jonathanvdc The original issue appears to be fixed in 15.6 by #20087. If there are other scenarios that you feel should be fixed, please file them as new issues describing exactly how to repro them.
Oh hi! This is my first issue on
dotnet/roslyn
, so I apologize if I made any rookie mistakes.Version Used:
csc
1.3.2 (from theMicrosoft.Net.Compilers
NuGet package)Steps to Reproduce:
csc InPlaceConstruction.cs
, tested with both/optimize+
and/optimize-
):Expected Behavior:
I expect this to print
I base this interpretation on the section on
new-object-expressions
in the (tentative) GitHub C# spec. Specifically, it states that:Actual Behavior:
The resulting program prints
I don't think that can be reconciled with what the spec says.
I have investigated the issue a little. The disassembly for
Main
isThe interesting bit here is that all instances are initialized by
call
instructions instead ofnewobj
/stloc
. This is a neat optimization for the first and second calls, but it produces unexpected behavior when the last constructor call is executed.What seems to happen is that
PartialCtorResultCannotEscape
inMicrosoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator
doesn't realize that thethis
pointer passed to thePoint(ref Point Other)
constructor aliases/may alias theOther
parameter. It then decides that it's okay to use acall
instruction to (re-)initializept1
, rather than going with the more conservativenewobj
/stloc
combo. And that sort of breaks my intuition/my reading of the spec.I would also like to note that a program compiled by
mcs
prints the same output here. I suspect that they're using the same heuristic asPartialCtorCannotEscape
, but I haven't looked at the relevant source code (yet).Is this something that you're interested in fixing? I'm asking because I'm writing a C# compiler myself, and I'm not sure which semantics I should implement. (For the moment, I'm sticking with the expected behavior.)
Anyway, thanks for reading through this issue! :)