runfoapp[bot] commented 2 years ago

Frequency - last 30 days as of 8/27:

8/26 - Rolling run 1971318 (7.0) - 4x hits
8/22 - PR 1959582 (main) - PR #74381 - 4x hits
8/19 - PR 1953466 - PR #74228 - 4x hits
8/15 - Rolling run 1943460 - 4x hits
7/28-8/26 - 57x Rolling runs of release/7.0-preview7 - see last 30 days
- Note: Except Rolling run 1913180 with 5x hits, all other runs had 1x hit

Note: main branch failures in last 30 days

Runfo report

Build	Kind	Start Time
1871493	PR 71819	2022-11-07
1871687	Rolling	2022-11-07
1871847	PR 71819	2022-11-07
1872236	PR 71948	2022-11-07
1872339	PR 71943	2022-11-07
1873007	Rolling	2022-11-07
1873836	PR 71988	2022-12-07
1873835	PR 71175	2022-12-07
1874128	PR 71385	2022-12-07
1874181	PR 71187	2022-12-07
1874234	Rolling	2022-12-07
1874881	PR 71971	2022-12-07
1875008	PR 71385	2022-12-07
1877271	PR 72078	2022-13-07
1877160	Rolling	2022-13-07
1877428	Rolling	2022-13-07
1877456	PR 71948	2022-13-07
1878318	PR 72108	2022-13-07
1878499	Rolling	2022-13-07
1878950	PR 71385	2022-13-07
1878990	Rolling	2022-13-07
1880070	Rolling	2022-14-07
1880254	Rolling	2022-14-07
1880316	PR 72167	2022-14-07
1880554	PR 72091	2022-14-07
1880908	PR 72184	2022-14-07
1881247	Rolling	2022-14-07
1881678	Rolling	2022-14-07
1882597	Rolling	2022-15-07
1882765	PR 72145	2022-15-07
1882819	Rolling	2022-15-07
1882810	PR 72252	2022-15-07
1882817	PR 72236	2022-15-07
1884136	Rolling	2022-15-07

dotnet-issue-labeler[bot] commented 2 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

jkotas commented 2 years ago

cc @LakshanF

jtschuster commented 2 years ago

Deleted duplicates in runfo, didn't mean to close this

LakshanF commented 2 years ago

72108 has the RR passing (including the Linux-arm64 platform). Waiting for some checks to complete in the runtime pipeline that seems to be due to service outage

LakshanF commented 2 years ago

The NativeAOT rolling runs are passing with #72108

vitek-karas commented 2 years ago

Failed again in https://github.com/dotnet/runtime/pull/71485

MichalStrehovsky commented 2 years ago

Blocking the test in https://github.com/dotnet/runtime/pull/72437.

The failure looks non-determinstic. We've seen it fail on Windows and Linux x64 (I didn't go over all the above failures, but it might not be a problem on ARM64).

The test is using Vectors. I wonder if there's a problem with preservation of XMM registers or something like that.

Cc @VSadov as this might be in your area of interest. (This has been happening for a while so it's not from the Linux suspension PR that just merged.)

VSadov commented 2 years ago

Have we seen this failure on win-x64?

I would suspect preserving of the xmm result in 2-register return case. Win-x64 would not be affected by that. Otherwise it is something else, but likely still related to preserving xmm results.

vitek-karas commented 2 years ago

It failed for me on win-x64: https://dev.azure.com/dnceng/public/_build/results?buildId=1888829&view=ms.vss-test-web.build-test-results-tab&runId=49307428&resultId=129815&paneView=debug

VSadov commented 2 years ago

I've looked through a few failures and it looks like the failures are x64-specific. They were seen on Linux and Windows, but I have not seen a failure on ARM64

This is interesting because Windows and Linux have different specifics in terms of preserving xmm registers (i.e. in Linux all xmm are volatile, but not on Windows, on the other hand Linux can return two xmm registers from a call)

It is hard to see commonality here, that would also be specific to NativeAOT. The only thing that I can think of is that, I believe, NativeAOT disables use of AVX by default, so vector math may be emitted differently.

MichalStrehovsky commented 2 years ago

If you can get a local repro of the failure, you can test the AVX instruction set theory by adding <IlcInstructionSet>avx2</IlcInstructionSet> to the project file and rebuilding. This will enable AVX2 on NativeAOT.

Similarly, setting set COMPlus_EnableAVX=0 will disable AVX on CoreCLR. Maybe it will repro there too then?

I would try extracting the test method into a standalone app that just runs this test on multiple threads, triggering GCs from time to time if you haven't tried that already.

VSadov commented 2 years ago

@MichalStrehovsky - yes that is what I plan to do. Try enabling AVX on NativeAOT and see if that changes the outcome.

VSadov commented 2 years ago

I am unable to reproduce this locally - either by running the test in a loop or by extracting the test code in a standalone app and running that in a loop on multiple threads while forcing frequent suspensions.

I tried on both Windows and Linux.

VSadov commented 2 years ago

Also - I did try disabling AVX and running the repro sample on CoreCLR - nope.

MichalStrehovsky commented 2 years ago

Matrix4x4CreateFromAxisAngleTest failed in #74381.

C:\helix\work\workitem\e>System.Numerics.Vectors.Tests.exe -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing -xml testResults.xml  
Running assembly:System.Numerics.Vectors.Tests, Version=7.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51
[FAIL] System.Numerics.Tests.Matrix4x4Tests.Matrix4x4CreateFromAxisAngleTest
Assert.True() Failure
Expected: True
Actual:   False
   at System.Numerics.Tests.Matrix4x4Tests.Matrix4x4CreateFromAxisAngleTest() + 0x4c4
   at System.Numerics.Vectors.Tests!<BaseAddress>+0x63121c
   at System.Reflection.DynamicInvokeInfo.Invoke(Object thisPtr, IntPtr methodToCall, Object[] parameters, BinderBundle binderBundle, Boolean wrapInTargetInvocationException) + 0xcf
Finished System.Numerics.Vectors.Tests, Version=7.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51

VSadov commented 2 years ago

I hoped it was fixed by some context/register related change. It is just less common now. At least it is on Windows+x64, which should be somewhat easier to deal with.

karelz commented 2 years ago

Frequency updated in top post -- somehow it happens very often in 7.0-preview7 branch, but main and 7.0 are also hit -- roughly 2x per week (in last 2 weeks).

VSadov commented 2 years ago

When it fails in a Windows run, it always fails in 4 runs in a row for the same build - on Windows10, Windows11, Windows10server, Windows8.1

Are we running the same binary? Perhaps the failure is a corruption at compile time? Is there a way to get a binary from a failed run?

VSadov commented 2 years ago

I'd be interested to see the failing exe artifacts\bin\System.Numerics.Vectors.Tests\Release\net7.0\publish\System.Numerics.Vectors.Tests.exe

From this build, for example

https://dev.azure.com/dnceng/public/_build/results?buildId=1971318&view=results

MichalStrehovsky commented 2 years ago

You can get it with runfo, instructions here: https://github.com/dotnet/runtime/blob/main/eng/testing/debug-dump-template.md

For this failed run: https://helix.dot.net/api/2019-06-17/jobs/67c3cd97-4086-489f-a0b2-6ed45d5196d5/workitems/System.Numerics.Vectors.Tests/console

This runfo command works for me: runfo get-helix-payload -j 67c3cd97-4086-489f-a0b2-6ed45d5196d5 -w System.Numerics.Vectors.Tests -o c:\blah

VSadov commented 2 years ago

Thanks! I have obtained the binary. And surely enough it fails. So it is a compile-time bug not a run-time bug.

No wonder it was not reproducing via re-running the test.

VSadov commented 2 years ago

now I can build the same commit locally and see what the diffs are in the bits or when executing/debugging

VSadov commented 2 years ago

The difference start when the following line is executed:

            expected = Matrix4x4.CreateFromQuaternion(Quaternion.CreateFromAxisAngle(Vector3.Normalize(Vector3.One), radians));

LOADS DIFFERENT DATA ==> 00007FF763CB7E5E  movups      xmm0,xmmword ptr [__readonlydata_System_Numerics_Vectors_Tests_System_Numerics_Tests_Matrix4x4Tests__Matrix4x4CreateFromAxisAngleTest+40h (07FF7647DF070h)]  
00007FF763CB7E65  movups      xmm1,xmmword ptr [__readonlydata_System_Numerics_Vectors_Tests_System_Numerics_Tests_Matrix4x4Tests__Matrix4x4CreateFromAxisAngleTest+50h (07FF7647DF080h)]  
00007FF763CB7E6C  andps       xmm1,xmm0  
00007FF763CB7E6F  mulps       xmm0,xmm1  
00007FF763CB7E72  movaps      xmm1,xmm0  
00007FF763CB7E75  shufps      xmm1,xmm0,0B1h

The first movups loads some static value (I suspect it is Vector.One ?)

My locally built exe, that passes the test, loads

XMM0 = DDDDDDDD3F800000-3F8000003F800000

The failing exe from the lab build loads:

XMM0 = FFFFFFFF3F800000-3F8000003F800000

3f800000 is float32 1.0 , so we have three 1.0 floats. Not sure what the rest is. Garbage? it seems it makes a difference.

@MichalStrehovsky I think someone else should take a look. I have only very vague idea where these __readonlydata_ things come from.

jkotas commented 2 years ago

Vector3 is 12 bytes, but the readonly data that Vector3.One is loaded from is 16 bytes. The JIT leaves the top 4 bytes of the readonly data uninitialized that leads to failures depending on what the uninitialized data happened to be. This looks like a JIT bug. @dotnet/jit-contrib Could you please take a look?

The questions to ask:

Are the top 4 bytes of xmm register that holds Vector3 data required to be initialized to 0?
Is the JIT expected to initialize the top 4 bytes of the Vector3 readonly data to 0?

jkotas commented 2 years ago

This is silent bad codegen. We should fix it for .NET 7.

ghost commented 2 years ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details

Frequency - [last 30 days](https://runfo.azurewebsites.net/search/tests/?q=started%3A%7E30+definition%3Aruntime-extra-platforms+name%3A%22system.numerics.tests.matrix4x4tests.matrix4x4createfromaxisangletest%22) as of 8/27: - 8/26 - Rolling run [1971318](https://runfo.azurewebsites.net/view/build/?number=1971318) (7.0) - 4x hits - 8/22 - PR [1959582](https://runfo.azurewebsites.net/view/build/?number=1959582) (main) - PR #74381 - 4x hits - 8/19 - PR [1953466](https://runfo.azurewebsites.net/view/build/?number=1953466) - PR #74228 - 4x hits - 8/15 - Rolling run [1943460](https://runfo.azurewebsites.net/view/build/?number=1943460) - 4x hits - 7/28-8/26 - 57x Rolling runs of release/7.0-preview7 - see [last 30 days](https://runfo.azurewebsites.net/search/tests/?q=started%3A%7E30+definition%3Aruntime-extra-platforms+name%3A%22system.numerics.tests.matrix4x4tests.matrix4x4createfromaxisangletest%22+targetBranch%3Arelease%2F7.0) - Note: Except Rolling run [1913180](https://runfo.azurewebsites.net/view/build/?number=1913180) with 5x hits, all other runs had 1x hit Note: **main branch** failures in [last 30 days](https://runfo.azurewebsites.net/search/tests/?q=started%3A%7E30+definition%3Aruntime-extra-platforms+name%3A%22system.numerics.tests.matrix4x4tests.matrix4x4createfromaxisangletest%22+targetBranch%3Amain) #### Runfo report |Build|Kind|Start Time| |---|---|---| [1871493](https://dev.azure.com/dnceng/public/_build/results?buildId=1871493)|[PR 71819](https://github.com/dotnet/runtime/pull/71819)|2022-11-07| [1871687](https://dev.azure.com/dnceng/public/_build/results?buildId=1871687)|Rolling|2022-11-07| [1871847](https://dev.azure.com/dnceng/public/_build/results?buildId=1871847)|[PR 71819](https://github.com/dotnet/runtime/pull/71819)|2022-11-07| [1872236](https://dev.azure.com/dnceng/public/_build/results?buildId=1872236)|[PR 71948](https://github.com/dotnet/runtime/pull/71948)|2022-11-07| [1872339](https://dev.azure.com/dnceng/public/_build/results?buildId=1872339)|[PR 71943](https://github.com/dotnet/runtime/pull/71943)|2022-11-07| [1873007](https://dev.azure.com/dnceng/public/_build/results?buildId=1873007)|Rolling|2022-11-07| [1873836](https://dev.azure.com/dnceng/public/_build/results?buildId=1873836)|[PR 71988](https://github.com/dotnet/runtime/pull/71988)|2022-12-07| [1873835](https://dev.azure.com/dnceng/public/_build/results?buildId=1873835)|[PR 71175](https://github.com/dotnet/runtime/pull/71175)|2022-12-07| [1874128](https://dev.azure.com/dnceng/public/_build/results?buildId=1874128)|[PR 71385](https://github.com/dotnet/runtime/pull/71385)|2022-12-07| [1874181](https://dev.azure.com/dnceng/public/_build/results?buildId=1874181)|[PR 71187](https://github.com/dotnet/runtime/pull/71187)|2022-12-07| [1874234](https://dev.azure.com/dnceng/public/_build/results?buildId=1874234)|Rolling|2022-12-07| [1874881](https://dev.azure.com/dnceng/public/_build/results?buildId=1874881)|[PR 71971](https://github.com/dotnet/runtime/pull/71971)|2022-12-07| [1875008](https://dev.azure.com/dnceng/public/_build/results?buildId=1875008)|[PR 71385](https://github.com/dotnet/runtime/pull/71385)|2022-12-07| [1877271](https://dev.azure.com/dnceng/public/_build/results?buildId=1877271)|[PR 72078](https://github.com/dotnet/runtime/pull/72078)|2022-13-07| [1877160](https://dev.azure.com/dnceng/public/_build/results?buildId=1877160)|Rolling|2022-13-07| [1877428](https://dev.azure.com/dnceng/public/_build/results?buildId=1877428)|Rolling|2022-13-07| [1877456](https://dev.azure.com/dnceng/public/_build/results?buildId=1877456)|[PR 71948](https://github.com/dotnet/runtime/pull/71948)|2022-13-07| [1878318](https://dev.azure.com/dnceng/public/_build/results?buildId=1878318)|[PR 72108](https://github.com/dotnet/runtime/pull/72108)|2022-13-07| [1878499](https://dev.azure.com/dnceng/public/_build/results?buildId=1878499)|Rolling|2022-13-07| [1878950](https://dev.azure.com/dnceng/public/_build/results?buildId=1878950)|[PR 71385](https://github.com/dotnet/runtime/pull/71385)|2022-13-07| [1878990](https://dev.azure.com/dnceng/public/_build/results?buildId=1878990)|Rolling|2022-13-07| [1880070](https://dev.azure.com/dnceng/public/_build/results?buildId=1880070)|Rolling|2022-14-07| [1880254](https://dev.azure.com/dnceng/public/_build/results?buildId=1880254)|Rolling|2022-14-07| [1880316](https://dev.azure.com/dnceng/public/_build/results?buildId=1880316)|[PR 72167](https://github.com/dotnet/runtime/pull/72167)|2022-14-07| [1880554](https://dev.azure.com/dnceng/public/_build/results?buildId=1880554)|[PR 72091](https://github.com/dotnet/runtime/pull/72091)|2022-14-07| [1880908](https://dev.azure.com/dnceng/public/_build/results?buildId=1880908)|[PR 72184](https://github.com/dotnet/runtime/pull/72184)|2022-14-07| [1881247](https://dev.azure.com/dnceng/public/_build/results?buildId=1881247)|Rolling|2022-14-07| [1881678](https://dev.azure.com/dnceng/public/_build/results?buildId=1881678)|Rolling|2022-14-07| [1882597](https://dev.azure.com/dnceng/public/_build/results?buildId=1882597)|Rolling|2022-15-07| [1882765](https://dev.azure.com/dnceng/public/_build/results?buildId=1882765)|[PR 72145](https://github.com/dotnet/runtime/pull/72145)|2022-15-07| [1882819](https://dev.azure.com/dnceng/public/_build/results?buildId=1882819)|Rolling|2022-15-07| [1882810](https://dev.azure.com/dnceng/public/_build/results?buildId=1882810)|[PR 72252](https://github.com/dotnet/runtime/pull/72252)|2022-15-07| [1882817](https://dev.azure.com/dnceng/public/_build/results?buildId=1882817)|[PR 72236](https://github.com/dotnet/runtime/pull/72236)|2022-15-07| [1884136](https://dev.azure.com/dnceng/public/_build/results?buildId=1884136)|Rolling|2022-15-07|

Author:	runfoapp[bot]
Assignees:	VSadov
Labels:	`area-CodeGen-coreclr`, `area-NativeAOT-coreclr`
Milestone:	7.0.0

BruceForstall commented 2 years ago

cc @tannergooding

EgorBo commented 2 years ago

We try to promote 12b simd to 16b everywhere where possible so I guess the data section's value has to be 16 bytes with zeroed last 4 bytes indeed

tannergooding commented 2 years ago

Are the top 4 bytes of xmm register that holds Vector3 data required to be initialized to 0? Is the JIT expected to initialize the top 4 bytes of the Vector3 readonly data to 0?

We don't have any strict requirement here. For most operations (add, and, subtract, etc) the actual functionality is "element-wise" and so the value of the top 4-bytes doesn't matter.

The few cases where it crosses lanes (such as DotProduct or Sum) we're supposed to ensure the upper bits are zeroed or ignored. In the case of DotProduct we emit the encoding that says to only use the lowest 3 elements.

tannergooding commented 2 years ago

Given this is NAOT only and NAOT must target a lower baseline, its possible there is a bug in the dpps emulation emitted for SSE2 only hardware that isn't correctly accounting for this.

I'll take a look tomorrow.

MichalStrehovsky commented 2 years ago

Given this is NAOT only and NAOT must target a lower baseline

If we don't have any other test coverage for the lower baselines, it might be a good quality week topic. With NAOT one at least has a deterministic repro once a bad binary has been produced. Running into this in a JIT configuration would be a lot more difficult to root cause.

This is not the first issue caused by this test hole - there were #72081 and #72158 just in the past month. NAOT covers the lowest denominator but there's a whole spectrum in between the lowest denominator and whatever machines we run the tests on that can happen in the real world and it looks like we don't test.

tannergooding commented 2 years ago

We do test, but only in the outer loop and only for the runtime tests. There has been a longstanding issue to also run the library tests under the various JIT Stress options for ISA enablement

JulieLeeMSFT commented 2 years ago

Given this is NAOT only and NAOT must target a lower baseline, its possible there is a bug in the dpps emulation emitted for SSE2 only hardware that isn't correctly accounting for this.

I'll take a look tomorrow.

Assigning to Tanner.

jkotas commented 2 years ago

@tannergooding Did you have a chance to take a look?

tannergooding commented 2 years ago

Yes. but it doesn't look to be in the Vector3.Dot emulation.

The relevant logic is here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lowerxarch.cpp#L3350

We pick Sse.Multiply, Sse3.HorizontalAdd -or- Sse.Shuffle, and Sse.Add and then the very first thing we do is mask things so that the unused component is zero: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lowerxarch.cpp#L3418

Notably we could do this "more efficiently" now that we have GT_CNS_VEC, but that's an unrelated JIT throughput optimization

tannergooding commented 2 years ago

The called Matrix and Quaternion methods aren't accelerated, just the Vector3.Normalize call, the various field accesses, the UnitX/Y/Z property accesses, and the Vector3 constructor call.

The test doesn't really use any special tricks or code paths either: https://github.com/dotnet/runtime/blob/4c240009fa1d955805b5e284b6f58f1e643d1ef1/src/libraries/System.Numerics.Vectors/tests/Matrix4x4Tests.cs#L582

jkotas commented 2 years ago

Here is codegen for Vector3.Length that I believe is the root cause of the problem:

[MethodImpl(MethodImplOptions.NoInlining)]
static float f(Vector3 v)
  => v.Length();

movss   xmm0,dword ptr [rcx+8] ds:00000052`9e6ff828=3f800000
movsd   xmm1,mmword ptr [rcx]
shufps  xmm1,xmm0,44h

// xmm1 has Vector3. The top 4 bytes of xmm1 have undefined value. They happen to be well defined in this specific case, but the JIT assumes the general case where they are undefined.

movups  xmm0,xmmword ptr [xxx!_readonlydata_xxx_Program____Main___g__f_0_0 (00007ff6`59060ee0)] // 0x0000 FFFF FFFF FFFF - mask for the top 4 bytes
andps   xmm0,xmm1 // xmm0 has properly masked Vector3
mulps   xmm0,xmm1 // We are multiplying properly masked Vector3 with unmasked Vector3!

movaps  xmm1,xmm0
shufps  xmm1,xmm0,0B1h
addps   xmm1,xmm0
movaps  xmm0,xmm1
shufps  xmm0,xmm1,4Eh
addps   xmm0,xmm1
sqrtss  xmm0,xmm0
ret

I think mulps xmm0,xmm1 is wrong. It should be mulps xmm0,xmm0 so that we multiply the masked values. The bad code works most of the time (0 multiplied by undefined value is zero) except when the undefined value happens to be NaN that will turn the whole thing into NaN.

Does this analysis make sense? Can you take it from here?

tannergooding commented 2 years ago

Yes, that makes sense. It should definitely be multiplying tmp * tmp, rather than tmp * original.

I'll see about getting a fix up first thing tomorrow.

dotnet / runtime

matrix4x4createfromaxisangletest failing on NativeAOT #72149

Runfo report

72108 has the RR passing (including the Linux-arm64 platform). Waiting for some checks to complete in the runtime pipeline that seems to be due to service outage