Closed jkotas closed 9 months ago
Tagging subscribers to this area: @dotnet/area-infrastructure-libraries See info in area-owners.md if you want to be subscribed.
Author: | jkotas |
---|---|
Assignees: | - |
Labels: | `area-Infrastructure-libraries`, `blocking-clean-ci`, `Known Build Error` |
Milestone: | - |
cc @stephentoub
bleh, the change is flawed. will revert and rethink.
bleh, the change is flawed. will revert and rethink.
Where is the flaw?
Hmm, I was thinking for a moment it would still need to check and actually throw, but with lists being invariant, it shouldn't be possible to specify a T and pass in a List of a derived type. So now I don't know what's going on and will need to look at what the failing test is doing, but I'm away from my computer through the weekend.
Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis See info in area-owners.md if you want to be subscribed.
Author: | jkotas |
---|---|
Assignees: | - |
Labels: | `area-System.Text.Json`, `blocking-clean-ci`, `untriaged`, `in-pr`, `Known Build Error` |
Milestone: | - |
I am not able to reproduce the failure. It seems to be Linux Alpine Arm64 specific.
I've tried a few things to reproduce it with no success. However, it reproduces on checked builds on x64 Linux (ubuntu!) (here's one for example). I've just noticed that it only happens on checked runtimes so I'm currently building one to verify it. I do have a guess as to what could be the issue: in the span ctor there is a type check however it gets skipped if the type is a value type, the failing code does the check for value types as well. As the stack traces make no sense I'm not sure if it's in any way realistic that T
is ever a value type where it doesn't match the actual array type but that's the only (obvious) difference. I'm currently trying to verify it.
This is codegen bug in PGO optimized code. The repro is not deterministic since it is triggered by a particular profile.
I have looked at the dump from runfo get-helix-payload -j ec78045b-ffd3-4800-9c13-a3f3db5a5b72 -w System.Text.Json.Tests -o $HOME/helix_payload/System.Text.Json.Tests
.
The codegen bug is in the PGO optimized VerifyWeatherForecastWithPOCOs
method. The JIT inlines the OrderBy
Linq call graph into the method, including the CollectionsMarshal.AsSpan
method called by the Linq.
Here is the disassembly of the inlined CollectionsMarshal.AsSpan
with the codegen bug:
...
// Debug.Assert(items is not null, "Implementation depends on List<T> always having an array.");
0x7fa2699feef8: testq %r8, %r8
0x7fa2699feefb: jne 0x7fa2699fef17
0x7fa2699feefd: movabsq $0x7fa254205690, %rdi ; string "Implementation depends on List<T> always having an array."
0x7fa2699fef07: movabsq $0x7fa254200008, %rsi ; string ""
0x7fa2699fef11: callq *-0x10151d47(%rip)
// if ((uint)size > (uint)items.Length)
0x7fa2699fef17: movq -0x418(%rbp), %r8 items
0x7fa2699fef1e: movl -0x1c0(%rbp), %ecx length
0x7fa2699fef24: cmpl %ecx, 0x8(%r8) items.Length
0x7fa2699fef28: jb 0x7fa2699ffe5c
// Debug.Assert(typeof(T[]) == list._items.GetType(), "Implementation depends on List<T> always using a T[] and not U[] where U : T.");
// BUG: The JIT incorrectly evaluated typeof(T[]) == list._items.GetType() as constant false and just left the null check
0x7fa2699fef2e: movq -0x400(%rbp), %rax
0x7fa2699fef35: movq 0x8(%rax), %rdi
0x7fa2699fef39: cmpb %dil, (%rdi) // null check
0x7fa2699fef3c: movabsq $0x7fa254228198, %rdi ; string "Implementation depends on List<T> always using a T[] and not U[] where U : T."
0x7fa2699fef46: movabsq $0x7fa254200008, %rsi ; string ""
0x7fa2699fef50: callq *-0x10151d86(%rip) System.Diagnostics.Debug.Fail
...
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
Author: | jkotas |
---|---|
Assignees: | - |
Labels: | `area-System.Text.Json`, `area-CodeGen-coreclr`, `blocking-clean-ci`, `in-pr`, `Known Build Error` |
Milestone: | 9.0.0 |
I am commenting out the assert in #96877 to workaround the bug. Please add the assert back once the issue is fixed.
Managed to repro locally on Main, investigating...
Minimal repro:
class Program
{
static void Main()
{
Console.WriteLine(foo<string>(new string[1]));
Console.ReadKey();
}
[MethodImpl(MethodImplOptions.NoInlining)]
static bool foo<T>(string[] list) => typeof(T[]) == list.GetType();
}
Reproduces without PGO, seems like the bug is in gtFoldTypeCompare
This is the current culprit:
same VM call in .NET 8.0 (where bug is not manifested) returns May
From a bisection, the regression seems to have been introduced in range https://github.com/dotnet/runtime/compare/82ac648e132b422354643a90032cc67439c6921c...23a93aa2305d580b6e77f7254624d0b1787b3008.
From a bisection, the regression seems to have been introduced in range 82ac648...23a93aa.
cc @davidwrighton @jkotas Could it be https://github.com/dotnet/runtime/pull/96466 ?
TLDR: it seems that JIT-EE compareTypesForEquality(string[], _Canon[])
returns MustNot
now.
Could it be https://github.com/dotnet/runtime/pull/96466 ?
Yes, that change included a fix in HasSameTypeDefAs
that broke implementation of compareTypesForEquality
. Thank you for tracking it down!
Build Information
Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=524311 Build error leg or test failing: System.Text.Json.Tests.WorkItemExecution Pull request: https://github.com/dotnet/runtime/pull/96870
Error Message
Fill the error message using step by step known issues guidance.
Known issue validation
Build: :mag_right: https://dev.azure.com/dnceng-public/public/_build/results?buildId=524311 Error message validated:
Implementation depends on List<T> always using a T[] and not U[] where U : T.
Result validation: :white_check_mark: Known issue matched with the provided build. Validation performed at: 1/12/2024 1:31:48 AM UTCReport
Summary