dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.05k stars 4.69k forks source link

Reuse temporaries for every new by default #4275

Open atlaste opened 9 years ago

atlaste commented 9 years ago

The behavior of what went wrong is described in quite some detail (including a detailed analysis by Mark Segal) on http://stackoverflow.com/questions/30383178/weird-stackoverflow-in-c-sharp-when-allocating-reference-types .

I do believe this could seriously impact performance, specifically: I think this could hurt JIT register allocation.

JIT disables the optimizer when the method code size is too big. Limits are found on : https://github.com/dotnet/coreclr/blob/01a5e9b4580cf6ea21de672f627402c30658ef22/src/jit/compiler.h#L7131 . JIT allocates a new temp for every new by default: https://github.com/dotnet/coreclr/blob/01a5e9b4580cf6ea21de672f627402c30658ef22/src/jit/importer.cpp#L11108 . It would be nice to reuse the temp when the optimizer is disabled.

category:throughput theme:importer skill-level:expert cost:large

cmckinsey commented 9 years ago

Thanks for the note Stefan.

I briefly read over the original stack overflow post. The overall conclusion is correct, you can write stack frames that will overflow the stack due to too large of stack requirements for local variables, this is easier to do with value-types (i.e. structs). Recursion only makes this easier.

The threshold for this is highly situation dependent and the JIT cannot make guarantees. Some of the reasons that change this are the following factors:

Unfortunately, because of the MSIL semantics the default expansion of the newobj instruction must include a new temporary. Let me explain in some pseudo code example:

ldarg p newobj C::C() stobj S

Assume C is a value-type. Inside the constructor C::C() it may have access to an interior pointer which points to the local variable S. For example, in the above code p may have been assigned the address of S. Hence, the constructor may read the value of S and depend on the former value, construct the new object on the stack, and then overwrite S. These semantics must be preserved and without a more sophisticated alias analysis to recognize when S has been address-escaped we must use a new temporary by default.

Because of JIT compilation time sensitivity, we don't have an exhaustive escape analysis to recognize all such cases accurately. However, it's something we might work on in the future. Of course such an analysis/optimization certainly would not run in a non-optimizing scenario and so in those cases a program might still stack overflow.

Hope this explains some things. Thanks for the report, but we cannot reuse temporaries by default legally for the reasons above. Assuming this makes sense I'll close the issue.

mikedn commented 9 years ago

Assume C is a value-type.

What about reference types? The example uses a reference type, Tuple<,>.

Hence, the constructor may read the value of S and depend on the former value, construct the new object on the stack, and then overwrite S

I think that suggestion wasn't to completely eliminate those temporaries but to reuse a single temporary variable for all the "newobj" instructions in the method (with a matching type). At least for reference types this should work without trouble. For value types I suspect that there may be issues if someone tries to get the address of this in the constructor but again, the original example uses reference types.

cmckinsey commented 9 years ago

In terms of when it's legal to try and eliminate the final assignment, I don't think there's a different between value-types and reference types. The constructor could still hold a reference to the LHS of the final assignment.

Also, at least for reference types, we want to use new temporaries by default so we can avoid having to do an explicit lifetime separation pass (also sometimes called live-range renaming). Setting up the intermediate representation so that each independent lifetime has a unique temporary does good things for register allocation typically.

I went back and read the original post again, I may have misread the suggestion. If the suggestion is that when optimizations are disabled, to reuse the same temporary in each newobj expansion of the same type, such that the JIT can compile larger methods without the user having to split them up in extreme cases, then yes I think we could do that.

Given there's a work around and the inherently implementation dependent nature of stack overflow limits in general, this sounds lower priority to me. If I'm wrong and there's strong demand for this we can reconsider.

atlaste commented 9 years ago

Thanks for the detailed explanation. Much appreciated, and it answers my question.

Using the details you described, I've been thinking about the 'demand'.

The problems arrise when you're doing code generation. Generating a bit of code isn't much of a problem; generating a lot is. In my case I was generating sort-of code coverage initialization code as part of my test suite :-) - but similar things might happen if you're moving from a test to a production environment. In cases such as this, you get strange problems that start to emerge when you're moving to a 'real world' situation. I expect that this usually happens if a lot of initialization IL is generated based on a larger context, as was my case.

In such cases, it's fine if optimizations are disabled. However, it would be nice to reuse the temporary, which would mean that the code will still work, regardless of the size. In other words (to quote you): "if the optimization is disabled, reuse the same temporary in each newobj expansion of the same type".

To summarize: The main issue here is that the behavior suddenly deviates from (usually much smaller) lab/test cases, making the application 'framework' unpredictable. Performance is hardly an issue.

As for the workaround: the problem there is that you should first know where to look for, especially since these kinds of things are usually tried on a smaller scale.

AndyAyersMS commented 6 years ago

See also dotnet/coreclr#14103 (now #8980)