dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.05k stars 4.69k forks source link

[NativeAOT/linux-arm] Determinism test failure #100508

Closed jkotas closed 3 weeks ago

jkotas commented 6 months ago
  Starting:    nativeaot.SmokeTests.XUnitWrapper (parallel test collections = on [2 threads], stop on fail = off)
    nativeaot/SmokeTests/Determinism/Determinism/Determinism.sh [FAIL]
      DOTNET_DbgEnableMiniDump is set and the createdump binary does not exist: /root/helix/work/workitem/e/nativeaot/SmokeTests/Determinism/Determinism//native/createdump
      Unhandled exception. System.Exception: Different at byte 1133358
         at Program.<Main>$(String[] args) + 0x3b9

Details: https://dev.azure.com/dnceng-public/public/_build/results?buildId=626098&view=ms.vss-test-web.build-test-results-tab&runId=15358088&paneView=dotnet-dnceng.dnceng-build-release-tasks.helix-test-information-tab&resultId=107545

jkotas commented 6 months ago

cc @filipnavara

dotnet-policy-service[bot] commented 6 months ago

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas See info in area-owners.md if you want to be subscribed.

filipnavara commented 6 months ago

I'll check the Helix artifacts. It doesn't repro locally on my RPi 5.

filipnavara commented 6 months ago

The differences are in the (order of) JITed code:

--- baseline.txt        2024-04-02 16:10:44.913544600 +0200
+++ compare.txt 2024-04-02 16:10:54.689947200 +0200
@@ -1,5 +1,5 @@

-baseline.object:       file format elf32-littlearm
+compare.object:        file format elf32-littlearm

 Disassembly of section __managedcode:

@@ -23,20 +23,20 @@
   10f522: 43 ec 15 2b          vmov    d5, r2, r3
   10f526: 24 ee 05 4b          vmul.f64        d4, d4, d5
   10f52a: 30 ee 04 0b          vadd.f64        d0, d0, d4
-  10f52e: 23 68                ldr     r3, [r4]
-  10f530: 1a 14                asrs    r2, r3, #16
-  10f532: d2 b2                uxtb    r2, r2
-  10f534: 51 21                movs    r1, #81
-  10f536: 8a 42                cmp     r2, r1
-  10f538: 10 d2                bhs     0x10f55c <_fram0System.Decimal_DecCalc__VarR8FromDec+0x6c> @ imm = #32
-  10f53a: d2 00                lsls    r2, r2, #3
-  10f53c: 4f f6 f4 71          movw    r1, #65524
-  10f540: cf f6 f8 71          movt    r1, #65528
-  10f544: 79 44                add     r1, pc
-  10f546: 8a 18                adds    r2, r1, r2
-  10f548: 92 ed 00 4b          vldr    d4, [r2]
+  10f52e: 4f f6 f4 73          movw    r3, #65524
+  10f532: cf f6 f8 73          movt    r3, #65528
+  10f536: 7b 44                add     r3, pc
+  10f538: 22 68                ldr     r2, [r4]
+  10f53a: 11 14                asrs    r1, r2, #16
+  10f53c: c9 b2                uxtb    r1, r1
+  10f53e: 51 20                movs    r0, #81
+  10f540: 81 42                cmp     r1, r0
+  10f542: 0b d2                bhs     0x10f55c <_fram0System.Decimal_DecCalc__VarR8FromDec+0x6c> @ imm = #22
+  10f544: c9 00                lsls    r1, r1, #3
+  10f546: 5b 18                adds    r3, r3, r1
+  10f548: 93 ed 00 4b          vldr    d4, [r3]
   10f54c: 80 ee 04 0b          vdiv.f64        d0, d0, d4
-  10f550: 00 2b                cmp     r3, #0
+  10f550: 00 2a                cmp     r2, #0
   10f552: 01 da                bge     0x10f558 <_fram0System.Decimal_DecCalc__VarR8FromDec+0x68> @ imm = #2
   10f554: b1 ee 40 0b          vneg.f64        d0, d0
   10f558: bd e8 18 88          pop.w   {r3, r4, r11, pc}

baseline.txt compare.txt

MichalStrehovsky commented 5 months ago

I'm not able to get differing JitDumps either. The sequence that got moved has to do with CreateSpan so I wonder if this is a case of RyuJIT taking dependency on numerical handle values to do something.

Here is a JitDump for one variation seen in the object file:

1.txt

I don't have a JitDump that would produce what we saw in the CI, hopefully one JitDump is enough to psychically debug this for a person skilled in RyuJIT.

TIHan commented 2 months ago

@jkotas for .NET 9, are we supporting NativeAOT for arm32?

jkotas commented 2 months ago

Yes, that's the plan.

BruceForstall commented 3 weeks ago

@jkotas @MichalStrehovsky Is this test failure a repeatable failure in the CI? Does it still occur? The linked build is gone (of course) and I can't tell what pipeline it was from.

jkotas commented 3 weeks ago

I have not seen that in CI recently.

BruceForstall commented 3 weeks ago

I'm going to close this as "no repro" for now. Of course, re-open if there is a current repro.