dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.16k stars 4.72k forks source link

matrix4x4createfromaxisangletest failing on NativeAOT #72149

Closed runfoapp[bot] closed 2 years ago

runfoapp[bot] commented 2 years ago

Frequency - last 30 days as of 8/27:

Note: main branch failures in last 30 days

Runfo report

Build Kind Start Time
1871493 PR 71819 2022-11-07
1871687 Rolling 2022-11-07
1871847 PR 71819 2022-11-07
1872236 PR 71948 2022-11-07
1872339 PR 71943 2022-11-07
1873007 Rolling 2022-11-07
1873836 PR 71988 2022-12-07
1873835 PR 71175 2022-12-07
1874128 PR 71385 2022-12-07
1874181 PR 71187 2022-12-07
1874234 Rolling 2022-12-07
1874881 PR 71971 2022-12-07
1875008 PR 71385 2022-12-07
1877271 PR 72078 2022-13-07
1877160 Rolling 2022-13-07
1877428 Rolling 2022-13-07
1877456 PR 71948 2022-13-07
1878318 PR 72108 2022-13-07
1878499 Rolling 2022-13-07
1878950 PR 71385 2022-13-07
1878990 Rolling 2022-13-07
1880070 Rolling 2022-14-07
1880254 Rolling 2022-14-07
1880316 PR 72167 2022-14-07
1880554 PR 72091 2022-14-07
1880908 PR 72184 2022-14-07
1881247 Rolling 2022-14-07
1881678 Rolling 2022-14-07
1882597 Rolling 2022-15-07
1882765 PR 72145 2022-15-07
1882819 Rolling 2022-15-07
1882810 PR 72252 2022-15-07
1882817 PR 72236 2022-15-07
1884136 Rolling 2022-15-07
dotnet-issue-labeler[bot] commented 2 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

jkotas commented 2 years ago

cc @LakshanF

jtschuster commented 2 years ago

Deleted duplicates in runfo, didn't mean to close this

LakshanF commented 2 years ago

72108 has the RR passing (including the Linux-arm64 platform). Waiting for some checks to complete in the runtime pipeline that seems to be due to service outage

LakshanF commented 2 years ago

The NativeAOT rolling runs are passing with #72108

vitek-karas commented 2 years ago

Failed again in https://github.com/dotnet/runtime/pull/71485

MichalStrehovsky commented 2 years ago

Blocking the test in https://github.com/dotnet/runtime/pull/72437.

The failure looks non-determinstic. We've seen it fail on Windows and Linux x64 (I didn't go over all the above failures, but it might not be a problem on ARM64).

The test is using Vectors. I wonder if there's a problem with preservation of XMM registers or something like that.

Cc @VSadov as this might be in your area of interest. (This has been happening for a while so it's not from the Linux suspension PR that just merged.)

VSadov commented 2 years ago

Have we seen this failure on win-x64?

I would suspect preserving of the xmm result in 2-register return case. Win-x64 would not be affected by that. Otherwise it is something else, but likely still related to preserving xmm results.

vitek-karas commented 2 years ago

It failed for me on win-x64: https://dev.azure.com/dnceng/public/_build/results?buildId=1888829&view=ms.vss-test-web.build-test-results-tab&runId=49307428&resultId=129815&paneView=debug

VSadov commented 2 years ago

I've looked through a few failures and it looks like the failures are x64-specific. They were seen on Linux and Windows, but I have not seen a failure on ARM64

This is interesting because Windows and Linux have different specifics in terms of preserving xmm registers (i.e. in Linux all xmm are volatile, but not on Windows, on the other hand Linux can return two xmm registers from a call)

It is hard to see commonality here, that would also be specific to NativeAOT. The only thing that I can think of is that, I believe, NativeAOT disables use of AVX by default, so vector math may be emitted differently.

MichalStrehovsky commented 2 years ago

If you can get a local repro of the failure, you can test the AVX instruction set theory by adding <IlcInstructionSet>avx2</IlcInstructionSet> to the project file and rebuilding. This will enable AVX2 on NativeAOT.

Similarly, setting set COMPlus_EnableAVX=0 will disable AVX on CoreCLR. Maybe it will repro there too then?

I would try extracting the test method into a standalone app that just runs this test on multiple threads, triggering GCs from time to time if you haven't tried that already.

VSadov commented 2 years ago

@MichalStrehovsky - yes that is what I plan to do. Try enabling AVX on NativeAOT and see if that changes the outcome.

VSadov commented 2 years ago

I am unable to reproduce this locally - either by running the test in a loop or by extracting the test code in a standalone app and running that in a loop on multiple threads while forcing frequent suspensions.

I tried on both Windows and Linux.

VSadov commented 2 years ago

Also - I did try disabling AVX and running the repro sample on CoreCLR - nope.

MichalStrehovsky commented 2 years ago

Matrix4x4CreateFromAxisAngleTest failed in #74381.

C:\helix\work\workitem\e>System.Numerics.Vectors.Tests.exe -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing -xml testResults.xml  
Running assembly:System.Numerics.Vectors.Tests, Version=7.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51
[FAIL] System.Numerics.Tests.Matrix4x4Tests.Matrix4x4CreateFromAxisAngleTest
Assert.True() Failure
Expected: True
Actual:   False
   at System.Numerics.Tests.Matrix4x4Tests.Matrix4x4CreateFromAxisAngleTest() + 0x4c4
   at System.Numerics.Vectors.Tests!<BaseAddress>+0x63121c
   at System.Reflection.DynamicInvokeInfo.Invoke(Object thisPtr, IntPtr methodToCall, Object[] parameters, BinderBundle binderBundle, Boolean wrapInTargetInvocationException) + 0xcf
Finished System.Numerics.Vectors.Tests, Version=7.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51
VSadov commented 2 years ago

I hoped it was fixed by some context/register related change. It is just less common now. At least it is on Windows+x64, which should be somewhat easier to deal with.

karelz commented 2 years ago

Frequency updated in top post -- somehow it happens very often in 7.0-preview7 branch, but main and 7.0 are also hit -- roughly 2x per week (in last 2 weeks).

VSadov commented 2 years ago

When it fails in a Windows run, it always fails in 4 runs in a row for the same build - on Windows10, Windows11, Windows10server, Windows8.1

Are we running the same binary? Perhaps the failure is a corruption at compile time? Is there a way to get a binary from a failed run?

VSadov commented 2 years ago

I'd be interested to see the failing exe artifacts\bin\System.Numerics.Vectors.Tests\Release\net7.0\publish\System.Numerics.Vectors.Tests.exe

From this build, for example

https://dev.azure.com/dnceng/public/_build/results?buildId=1971318&view=results

MichalStrehovsky commented 2 years ago

You can get it with runfo, instructions here: https://github.com/dotnet/runtime/blob/main/eng/testing/debug-dump-template.md

For this failed run: https://helix.dot.net/api/2019-06-17/jobs/67c3cd97-4086-489f-a0b2-6ed45d5196d5/workitems/System.Numerics.Vectors.Tests/console

This runfo command works for me: runfo get-helix-payload -j 67c3cd97-4086-489f-a0b2-6ed45d5196d5 -w System.Numerics.Vectors.Tests -o c:\blah

VSadov commented 2 years ago

Thanks! I have obtained the binary. And surely enough it fails. So it is a compile-time bug not a run-time bug.

No wonder it was not reproducing via re-running the test.

VSadov commented 2 years ago

now I can build the same commit locally and see what the diffs are in the bits or when executing/debugging

VSadov commented 2 years ago

The difference start when the following line is executed:

            expected = Matrix4x4.CreateFromQuaternion(Quaternion.CreateFromAxisAngle(Vector3.Normalize(Vector3.One), radians));

LOADS DIFFERENT DATA ==> 00007FF763CB7E5E  movups      xmm0,xmmword ptr [__readonlydata_System_Numerics_Vectors_Tests_System_Numerics_Tests_Matrix4x4Tests__Matrix4x4CreateFromAxisAngleTest+40h (07FF7647DF070h)]  
00007FF763CB7E65  movups      xmm1,xmmword ptr [__readonlydata_System_Numerics_Vectors_Tests_System_Numerics_Tests_Matrix4x4Tests__Matrix4x4CreateFromAxisAngleTest+50h (07FF7647DF080h)]  
00007FF763CB7E6C  andps       xmm1,xmm0  
00007FF763CB7E6F  mulps       xmm0,xmm1  
00007FF763CB7E72  movaps      xmm1,xmm0  
00007FF763CB7E75  shufps      xmm1,xmm0,0B1h  

The first movups loads some static value (I suspect it is Vector.One ?)

My locally built exe, that passes the test, loads

XMM0 = DDDDDDDD3F800000-3F8000003F800000 

The failing exe from the lab build loads:

XMM0 = FFFFFFFF3F800000-3F8000003F800000

3f800000 is float32 1.0 , so we have three 1.0 floats. Not sure what the rest is. Garbage? it seems it makes a difference.

@MichalStrehovsky I think someone else should take a look. I have only very vague idea where these __readonlydata_ things come from.

jkotas commented 2 years ago

Vector3 is 12 bytes, but the readonly data that Vector3.One is loaded from is 16 bytes. The JIT leaves the top 4 bytes of the readonly data uninitialized that leads to failures depending on what the uninitialized data happened to be. This looks like a JIT bug. @dotnet/jit-contrib Could you please take a look?

The questions to ask:

jkotas commented 2 years ago

This is silent bad codegen. We should fix it for .NET 7.

ghost commented 2 years ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details
Frequency - [last 30 days](https://runfo.azurewebsites.net/search/tests/?q=started%3A%7E30+definition%3Aruntime-extra-platforms+name%3A%22system.numerics.tests.matrix4x4tests.matrix4x4createfromaxisangletest%22) as of 8/27: - 8/26 - Rolling run [1971318](https://runfo.azurewebsites.net/view/build/?number=1971318) (7.0) - 4x hits - 8/22 - PR [1959582](https://runfo.azurewebsites.net/view/build/?number=1959582) (main) - PR #74381 - 4x hits - 8/19 - PR [1953466](https://runfo.azurewebsites.net/view/build/?number=1953466) - PR #74228 - 4x hits - 8/15 - Rolling run [1943460](https://runfo.azurewebsites.net/view/build/?number=1943460) - 4x hits - 7/28-8/26 - 57x Rolling runs of release/7.0-preview7 - see [last 30 days](https://runfo.azurewebsites.net/search/tests/?q=started%3A%7E30+definition%3Aruntime-extra-platforms+name%3A%22system.numerics.tests.matrix4x4tests.matrix4x4createfromaxisangletest%22+targetBranch%3Arelease%2F7.0) - Note: Except Rolling run [1913180](https://runfo.azurewebsites.net/view/build/?number=1913180) with 5x hits, all other runs had 1x hit Note: **main branch** failures in [last 30 days](https://runfo.azurewebsites.net/search/tests/?q=started%3A%7E30+definition%3Aruntime-extra-platforms+name%3A%22system.numerics.tests.matrix4x4tests.matrix4x4createfromaxisangletest%22+targetBranch%3Amain) #### Runfo report |Build|Kind|Start Time| |---|---|---| [1871493](https://dev.azure.com/dnceng/public/_build/results?buildId=1871493)|[PR 71819](https://github.com/dotnet/runtime/pull/71819)|2022-11-07| [1871687](https://dev.azure.com/dnceng/public/_build/results?buildId=1871687)|Rolling|2022-11-07| [1871847](https://dev.azure.com/dnceng/public/_build/results?buildId=1871847)|[PR 71819](https://github.com/dotnet/runtime/pull/71819)|2022-11-07| [1872236](https://dev.azure.com/dnceng/public/_build/results?buildId=1872236)|[PR 71948](https://github.com/dotnet/runtime/pull/71948)|2022-11-07| [1872339](https://dev.azure.com/dnceng/public/_build/results?buildId=1872339)|[PR 71943](https://github.com/dotnet/runtime/pull/71943)|2022-11-07| [1873007](https://dev.azure.com/dnceng/public/_build/results?buildId=1873007)|Rolling|2022-11-07| [1873836](https://dev.azure.com/dnceng/public/_build/results?buildId=1873836)|[PR 71988](https://github.com/dotnet/runtime/pull/71988)|2022-12-07| [1873835](https://dev.azure.com/dnceng/public/_build/results?buildId=1873835)|[PR 71175](https://github.com/dotnet/runtime/pull/71175)|2022-12-07| [1874128](https://dev.azure.com/dnceng/public/_build/results?buildId=1874128)|[PR 71385](https://github.com/dotnet/runtime/pull/71385)|2022-12-07| [1874181](https://dev.azure.com/dnceng/public/_build/results?buildId=1874181)|[PR 71187](https://github.com/dotnet/runtime/pull/71187)|2022-12-07| [1874234](https://dev.azure.com/dnceng/public/_build/results?buildId=1874234)|Rolling|2022-12-07| [1874881](https://dev.azure.com/dnceng/public/_build/results?buildId=1874881)|[PR 71971](https://github.com/dotnet/runtime/pull/71971)|2022-12-07| [1875008](https://dev.azure.com/dnceng/public/_build/results?buildId=1875008)|[PR 71385](https://github.com/dotnet/runtime/pull/71385)|2022-12-07| [1877271](https://dev.azure.com/dnceng/public/_build/results?buildId=1877271)|[PR 72078](https://github.com/dotnet/runtime/pull/72078)|2022-13-07| [1877160](https://dev.azure.com/dnceng/public/_build/results?buildId=1877160)|Rolling|2022-13-07| [1877428](https://dev.azure.com/dnceng/public/_build/results?buildId=1877428)|Rolling|2022-13-07| [1877456](https://dev.azure.com/dnceng/public/_build/results?buildId=1877456)|[PR 71948](https://github.com/dotnet/runtime/pull/71948)|2022-13-07| [1878318](https://dev.azure.com/dnceng/public/_build/results?buildId=1878318)|[PR 72108](https://github.com/dotnet/runtime/pull/72108)|2022-13-07| [1878499](https://dev.azure.com/dnceng/public/_build/results?buildId=1878499)|Rolling|2022-13-07| [1878950](https://dev.azure.com/dnceng/public/_build/results?buildId=1878950)|[PR 71385](https://github.com/dotnet/runtime/pull/71385)|2022-13-07| [1878990](https://dev.azure.com/dnceng/public/_build/results?buildId=1878990)|Rolling|2022-13-07| [1880070](https://dev.azure.com/dnceng/public/_build/results?buildId=1880070)|Rolling|2022-14-07| [1880254](https://dev.azure.com/dnceng/public/_build/results?buildId=1880254)|Rolling|2022-14-07| [1880316](https://dev.azure.com/dnceng/public/_build/results?buildId=1880316)|[PR 72167](https://github.com/dotnet/runtime/pull/72167)|2022-14-07| [1880554](https://dev.azure.com/dnceng/public/_build/results?buildId=1880554)|[PR 72091](https://github.com/dotnet/runtime/pull/72091)|2022-14-07| [1880908](https://dev.azure.com/dnceng/public/_build/results?buildId=1880908)|[PR 72184](https://github.com/dotnet/runtime/pull/72184)|2022-14-07| [1881247](https://dev.azure.com/dnceng/public/_build/results?buildId=1881247)|Rolling|2022-14-07| [1881678](https://dev.azure.com/dnceng/public/_build/results?buildId=1881678)|Rolling|2022-14-07| [1882597](https://dev.azure.com/dnceng/public/_build/results?buildId=1882597)|Rolling|2022-15-07| [1882765](https://dev.azure.com/dnceng/public/_build/results?buildId=1882765)|[PR 72145](https://github.com/dotnet/runtime/pull/72145)|2022-15-07| [1882819](https://dev.azure.com/dnceng/public/_build/results?buildId=1882819)|Rolling|2022-15-07| [1882810](https://dev.azure.com/dnceng/public/_build/results?buildId=1882810)|[PR 72252](https://github.com/dotnet/runtime/pull/72252)|2022-15-07| [1882817](https://dev.azure.com/dnceng/public/_build/results?buildId=1882817)|[PR 72236](https://github.com/dotnet/runtime/pull/72236)|2022-15-07| [1884136](https://dev.azure.com/dnceng/public/_build/results?buildId=1884136)|Rolling|2022-15-07|
Author: runfoapp[bot]
Assignees: VSadov
Labels: `area-CodeGen-coreclr`, `area-NativeAOT-coreclr`
Milestone: 7.0.0
BruceForstall commented 2 years ago

cc @tannergooding

EgorBo commented 2 years ago

We try to promote 12b simd to 16b everywhere where possible so I guess the data section's value has to be 16 bytes with zeroed last 4 bytes indeed

tannergooding commented 2 years ago

Are the top 4 bytes of xmm register that holds Vector3 data required to be initialized to 0? Is the JIT expected to initialize the top 4 bytes of the Vector3 readonly data to 0?

We don't have any strict requirement here. For most operations (add, and, subtract, etc) the actual functionality is "element-wise" and so the value of the top 4-bytes doesn't matter.

The few cases where it crosses lanes (such as DotProduct or Sum) we're supposed to ensure the upper bits are zeroed or ignored. In the case of DotProduct we emit the encoding that says to only use the lowest 3 elements.

tannergooding commented 2 years ago

Given this is NAOT only and NAOT must target a lower baseline, its possible there is a bug in the dpps emulation emitted for SSE2 only hardware that isn't correctly accounting for this.

I'll take a look tomorrow.

MichalStrehovsky commented 2 years ago

Given this is NAOT only and NAOT must target a lower baseline

If we don't have any other test coverage for the lower baselines, it might be a good quality week topic. With NAOT one at least has a deterministic repro once a bad binary has been produced. Running into this in a JIT configuration would be a lot more difficult to root cause.

This is not the first issue caused by this test hole - there were #72081 and #72158 just in the past month. NAOT covers the lowest denominator but there's a whole spectrum in between the lowest denominator and whatever machines we run the tests on that can happen in the real world and it looks like we don't test.

tannergooding commented 2 years ago

We do test, but only in the outer loop and only for the runtime tests. There has been a longstanding issue to also run the library tests under the various JIT Stress options for ISA enablement

JulieLeeMSFT commented 2 years ago

Given this is NAOT only and NAOT must target a lower baseline, its possible there is a bug in the dpps emulation emitted for SSE2 only hardware that isn't correctly accounting for this.

I'll take a look tomorrow.

Assigning to Tanner.

jkotas commented 2 years ago

@tannergooding Did you have a chance to take a look?

tannergooding commented 2 years ago

Yes. but it doesn't look to be in the Vector3.Dot emulation.

The relevant logic is here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lowerxarch.cpp#L3350

We pick Sse.Multiply, Sse3.HorizontalAdd -or- Sse.Shuffle, and Sse.Add and then the very first thing we do is mask things so that the unused component is zero: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lowerxarch.cpp#L3418

tannergooding commented 2 years ago

The called Matrix and Quaternion methods aren't accelerated, just the Vector3.Normalize call, the various field accesses, the UnitX/Y/Z property accesses, and the Vector3 constructor call.

The test doesn't really use any special tricks or code paths either: https://github.com/dotnet/runtime/blob/4c240009fa1d955805b5e284b6f58f1e643d1ef1/src/libraries/System.Numerics.Vectors/tests/Matrix4x4Tests.cs#L582

jkotas commented 2 years ago

Here is codegen for Vector3.Length that I believe is the root cause of the problem:

[MethodImpl(MethodImplOptions.NoInlining)]
static float f(Vector3 v)
  => v.Length();
movss   xmm0,dword ptr [rcx+8] ds:00000052`9e6ff828=3f800000
movsd   xmm1,mmword ptr [rcx]
shufps  xmm1,xmm0,44h

// xmm1 has Vector3. The top 4 bytes of xmm1 have undefined value. They happen to be well defined in this specific case, but the JIT assumes the general case where they are undefined.

movups  xmm0,xmmword ptr [xxx!_readonlydata_xxx_Program____Main___g__f_0_0 (00007ff6`59060ee0)] // 0x0000 FFFF FFFF FFFF - mask for the top 4 bytes
andps   xmm0,xmm1 // xmm0 has properly masked Vector3
mulps   xmm0,xmm1 // We are multiplying properly masked Vector3 with unmasked Vector3!

movaps  xmm1,xmm0
shufps  xmm1,xmm0,0B1h
addps   xmm1,xmm0
movaps  xmm0,xmm1
shufps  xmm0,xmm1,4Eh
addps   xmm0,xmm1
sqrtss  xmm0,xmm0
ret

I think mulps xmm0,xmm1 is wrong. It should be mulps xmm0,xmm0 so that we multiply the masked values. The bad code works most of the time (0 multiplied by undefined value is zero) except when the undefined value happens to be NaN that will turn the whole thing into NaN.

Does this analysis make sense? Can you take it from here?

tannergooding commented 2 years ago

Yes, that makes sense. It should definitely be multiplying tmp * tmp, rather than tmp * original.

I'll see about getting a fix up first thing tomorrow.