Closed myarichuk closed 4 years ago
It only crashes if it's run as a 64-bit process. It doesn't matter if it's .NET Core 2.0 or .NET Framework 4.7.1.
The exception originates in this JIT code: https://github.com/dotnet/coreclr/blob/15e1472aab7ed42dafde0331ed6cece465a0763d/src/jit/emit.cpp#L567-L582
I haven't checked yet the IL but it looks like a method contains a large number of locals (56129). Or the JIT generates a ton of locals when importing it...
Maybe related to dotnet/runtime#8420?
Maybe related to 12480?
Yup, it looks very similar. The IL method has only 4 locals but the IL code itself contains an unfortunate pattern - load a bunch of stuff onto the stack and then branch. The net result - an avalanche of spill temps.
Some numbers: lva count = 56129 bb count = 2194 il size = 68941 dynamic call sites = 858
I suppose it would be nice for the JIT to handle such a monster but the code is as horrible as it gets. Hopefully the sample code is just that, sample code, and not actual production code.
It is not a production code, but it is related to production code: The sample code in the gist came from an index compiled by RavenDB 4.0 - it is part of an issue investigation here - http://issues.hibernatingrhinos.com/issue/RavenDB-9040 (In RavenDB indexes are defined as LINQ queries and compiled at the server-side.)
Since RavenDB allows users to define indexes with the syntax of LINQ queries, it is definitely possible that users will write similar code to what can be seen in the repro gist.
(Index compiler code can be seen here: https://github.com/ravendb/ravendb/blob/v4.0/src/Raven.Server/Documents/Indexes/Static/IndexCompiler.cs)
@0xd4d you are right - since I tested that on AnyCPU, haven't noticed that it depends on 64/32 bits and not on runtime.
@mikedn the code is something that went through multiple translation stages to get where it did. The actual originating code isn't too hot either, though much simpler.
the code is something that went through multiple translation stages to get where it did. The actual originating code isn't too hot either, though much simpler.
Presumably moving doc.NumVals.ContainsKey("Value001") && doc.NumVals["Value001"] != null ? (doc.NumVals["Value001"]).Value1 : ((double?)null),
to a double? GetValue(dynamic values, string name)
isn't an option?
It only crashes if it's run as a 64-bit process. It doesn't matter if it's .NET Core 2.0 or .NET Framework 4.7.1.
Yep, works on x86 and requires over 200 kbytes of stack :)
The original code looked something like this:
select new
{
doc.Id,
Value001 = doc.NumVals.ContainsKey("Value001") && doc.NumVals["Value001"] != null ? doc.NumVals["Value001"].Value1 : (double?)null,
Value002 = doc.NumVals.ContainsKey("Value002") && doc.NumVals["Value002"] != null ? doc.NumVals["Value002"].Value1 : (double?)null,
}
And yes, (double?)doc.NumVals["Value002"]?.Value1
should actually work here. But that is code sent by customer.
Also it may be that disabling CSC's optimizer here (assuming it is on) will lessen CSC's appetite for pending things on the eval stack across branches.
How do I do that?
Also it may be that disabling CSC's optimizer here (assuming it is on) will lessen CSC's appetite for pending things on the eval stack across branches.
Tried, no effect. It's quite possible that the CSC optimizer isn't involved in this, that's just how CSC generates dynamic callsites.
Also related to https://github.com/dotnet/coreclr/issues/14103
I looked at fixing dotnet/runtime#8420 in the jit by re-using spill temps across multiple joins, but it quickly turns into solving a general pre-import dataflow problem over the entire flow graph where we'd first need to scan the IL to figure out the stack behavior of each block. Today the jit only "solves" a constrained version of this for spill cliques.
Wonder how hard it would be to fix this in the encoder, maybe via some kind of escape
bits for local addresses that index into a separate table with bigger entries.
Perhaps it would better if managed compilers refrain from generating this kind of IL. Issues like this and dotnet/runtime#9130 seem to indicate that it's just too much trouble to deal with this aspect of the IL stack. And it's quite possible that managed compilers think that they're doing the JIT a favor by keeping things on the stack instead of spilling to locals...
We now produce a somewhat clearer error message when the jit hits an internal limitation (see dotnet/coreclr#16320). So hopefully anyone who hits this realizes that there is something overly complex about their method and can find workarounds.
Moving this issue to future as we should still look into ways to reduce the number of cases where the jit runs into internal limitations.
This may have been fixed by dotnet/coreclr#17780.
@briansull can you take a look?
Fixed with dotnet/coreclr#17793
Hi, we are using .NET Core 2.1.2 and this issue still occurs. Failing test here: https://github.com/ravendb/ravendb/blob/v4.0/test/SlowTests/Issues/RavenDB_10621.cs
Hi, Running 2.1.400-preview-009063, we have another repro of this (similiar) issue, see: https://gist.github.com/maximburyak/aa75e379e3315af6980bb6ec8fd8dce1
Note that replacing "this0.s." with an empty string allows the code to run successfully.
Reproduction of this exception is in the following gist: https://gist.github.com/myarichuk/db2a290ec8ae5ea0faf2698b0c5b02c2
In this case, InvalidProgramException exception is thrown when the anonymous object in in the Map_0() function has more than 43 fields.
The code in the gist runs fine in a .Net 4.7 project.