Coredistools failure to disassemble vextracti64x4

BruceForstall commented 1 year ago

SuperPMI is reporting an asm diff in:

JIT.HardwareIntrinsics.General._Vector512_1.VectorGetAndWithLowerAndUpper__GetAndWithLowerAndUpperByte:RunBasicScenario():this

but no textual diff, for both x64 and x86. For x64, it is in 236d7997-3d71-45f9-b7d7-5241ad89a56f.windows.x64\coreclr_tests.run.windows.x64.checked.mch.

The superpmi log shows:

[10:34:40] ERROR: Decode Failure Left@ offset 1fb4f77e456
[10:34:40] ERROR: Decode Failure Right@ offset 1fb52ec6816
[10:34:40] ERROR: Decode Failure Left@ offset 1fb4f77e456
[10:34:40] ERROR: Decode Failure Right@ offset 1fb52ec6816

The issue is that coredistools is failing to disassemble:

62D3BD483BF001       vextracti64x4 ymm8, zmm6, 1

coredistools was last updated based on LLVM 13.0.1 in March 2022 with https://github.com/dotnet/jitutils/pull/351 (and related). The most recent version is LLVM 16.0.0 released March 2023 (https://releases.llvm.org/). Does it have a fix for this?

I verified that this disassembles correctly in Visual Studio.

Are there other cases where vextracti64x4 is properly decoded so it's not all cases, just some?

If we can't update coredistools with a fix, we'll have to do something so SuperPMI doesn't display spurious diffs. E.g., filter out the failing MCs.

@dotnet/avx512-contrib @dotnet/jit-contrib

BruceForstall commented 1 year ago

I built a coredistools.dll updated to the latest LLVM, 16.0.1, and this failure persists.

tannergooding commented 1 year ago

This looks to be an actual bug in the emitted codegen, noting the emitted BD vs the correct FD in the 3rd byte.

Emitted: 62D3BD483BF001
Correct: 62D3FD483BF001

The technical encoding is: EVEX.512.66.0F3A.W1 3B /r ib. where the EVEX encoding is 0x62 P0 P1 P2:

* P0: R  X  B  R' 0  0  m  m
* P1: W  v  v  v  v  1  p  p
* P2: z  L' L  b  V' a  a  a

We've encoded (where P1E is emitted, P1C is correct):

* P0:  1 1 0 1  0 0 1 1
* P1E: 1 0 1 1  1 1 0 1
* P1C: 1 1 1 1  1 1 0 1
* P2:  0 1 0 0  1 0 0 0

So we're simply misencoding the VEX.vvvv register here. It's meant to be stored in bit inverted format and combined with the V' bit, we've got 1_0111 emitted, which is 8.

The error comes in that vextract doesn't use the vvvv field at all, and so it must be 1_1111. Instead since the instruction is ins reg/mem, reg, imm the mod/rm field is used to store the destination.

It's not immediately obvious where we're going wrong here and we may actually have a related issue with the AVX vextractf128 and vextracti128

tannergooding commented 1 year ago

So we do document the format incorrectly as IF_RRW_RRW_CNS when it should be IF_RWR_RRD_CNS. This is a long-standing issue and something we have wrong for a couple instructions, but one that doesn't cause any problems in practice.

The issue ends up being that the instruction is incorrectly tracking the IsDstDstSrcAVXInstruction flag, which means we double encode what is supposed to be just regFor012Bits also in the regFor3456Bits. This is unique to the Avx512 extract instructions and isn't an issue for the Avx ones after all

Fix should just be removing the flag. We should also probably audit the IsDstDstSrc and IsDstSrcSrc flags for all SIMD instructions. This is mostly needed for the new the AVX512 ones, but we've recently audited the other flags in general, so it wouldn't hurt to finish doing the same here.

BruceForstall commented 1 year ago

It was confusing because both VS and windbg disassemble this instruction as we expect to see it. Presumably because they ignore the field that is incorrectly encoded?

However, when I run this program (JIT\HardwareIntrinsics\General\Vector512_1\Vector512_1_ro\Vector512_1_ro.dll), with DOTNET_TieredCompilation=0 I get an illegal instruction exception (on both AMD 7950X and Intel Xeon Platinum 8370C).

So is this test case actually running in the CI? It must be if the test ended up in the spmi coreclr_tests.run collection.

Is it running but not correctly reporting the error/failure?

tannergooding commented 1 year ago

So is this test case actually running in the CI?

The encoding is only incorrect when we have vextracti64x4 reg, reg. If we instead contain it as part of a store and emit vextracti64x4 [addr], reg we don't have any issues.

Up until yesterday we were also only running Vector512 tests in Pri1, so its possible that we weren't seeing this in CI because of that and that it will now be showing up in the outerloop jobs.

BruceForstall commented 1 year ago

The latest outerloop job (from Apr 19, 2023 at 2:00 AM PDT) has a few failures in AVX tests, but not in this one:

https://dev.azure.com/dnceng-public/public/_build/results?buildId=244156&view=ms.vss-test-web.build-test-results-tab

However, when I look at the console log for the HardwareIntrinsics_General_ro test, I see the failure:

11:03:55.549 Running test: _Vector512_1_ro::JIT.HardwareIntrinsics.General._Vector512_1.Program.GetAndWithLowerAndUpperByte()
Beginning scenario: RunBasicScenario
Fatal error. System.Runtime.InteropServices.SEHException (0x80004005): External component has thrown an exception.
   at JIT.HardwareIntrinsics.General._Vector512_1.VectorGetAndWithLowerAndUpper__GetAndWithLowerAndUpperByte.RunBasicScenario()
   at JIT.HardwareIntrinsics.General._Vector512_1.Program.GetAndWithLowerAndUpperByte()
   at Program.<<Main>$>g__TestExecutor26|0_25(System.IO.StreamWriter, System.IO.StreamWriter, <>c__DisplayClass0_0 ByRef)
   at Program.<Main>$(System.String[])
App Exit Code: -1073741795
Expected: 100
Actual: -1073741795
END EXECUTION - FAILED
FAILED
[XUnitLogChecker]: Item 'HardwareIntrinsics_General_ro' did not finish running. Checking and fixing the log...
[XUnitLogChecker]: XUnit log file has been fixed!

1259/2363 tests run.
* 1259 tests passed.
* 0 tests failed.
* 0 tests skipped.

[XUnitLogChecker]: Checking for dumps...
[XUnitLogChecker]: No crash dumps found. Continuing...
[XUnitLogChecker]: Finished!
ERROR: The process "corerun.exe" not found.
2023-04-19T11:03:58.485Z    INFO    run.py  run(48) main    Beginning reading of test results.
2023-04-19T11:03:58.485Z    INFO    run.py  __init__(42)    read_results    Searching 'C:\h\w\B51E0A11\w\AD88096E\e' for test results files
2023-04-19T11:03:58.485Z    INFO    run.py  __init__(48)    read_results    Found results file C:\h\w\B51E0A11\w\AD88096E\e\JIT\HardwareIntrinsics\HardwareIntrinsics_General_ro\HardwareIntrinsics_General_ro.testResults.xml with format xunit
2023-04-19T11:03:58.501Z    INFO    run.py  __init__(42)    read_results    Searching 'C:\h\w\B51E0A11\w\AD88096E\uploads' for test results files
2023-04-19T11:03:58.501Z    INFO    run.py  packing_test_reporter(30)   report_results  Packing 1259 test reports to 'C:\h\w\B51E0A11\w\AD88096E\e\__test_report.json'
2023-04-19T11:03:58.501Z    INFO    run.py  packing_test_reporter(33)   report_results  Packed 281920 bytes
['HardwareIntrinsics_General_ro' END OF WORK ITEM LOG: Command exited with 0]

https://helixre107v0xdcypoyl9e7f.blob.core.windows.net/dotnet-runtime-refs-heads-main-abd91328c2f94763aa/HardwareIntrinsics_General_ro/1/console.b75ffbb8.log?helixlogtype=result

BUT, notice that even though the test failed, the harness returns 0 -- no failure.

Possibly this is a XUnitLogChecker problem?

=======

btw, the tests are invoked with:

call JIT\HardwareIntrinsics\HardwareIntrinsics_General_ro\HardwareIntrinsics_General_ro.cmd -usewatcher

I don't have a HardwareIntrinsics_General_ro.cmd in my locally built artifacts, even though I built pri-1 tests. Why?

I DO have artifacts\tests\coreclr\windows.x64.Checked\JIT\HardwareIntrinsics\General\Vector512_1\Vector512_1_ro\Vector512_1_ro.cmd, for example.

tannergooding commented 1 year ago

Possibly this is a XUnitLogChecker problem?

CC. @trylek are there any known issues here? Could we be missing reported tests failures in some cases?

Perhaps we are looking for # failed and not accounting for tests which didn't run, crashes, bad exit codes, or some timeouts?

tannergooding commented 1 year ago

I don't have a HardwareIntrinsics_General_ro.cmd in my locally built artifacts, even though I built pri-1 tests. Why?

Did you do anything related to BuildAsStandalone? I think that may stop the merged test wrappers from being built.

BruceForstall commented 1 year ago

Also, @ivdiazsa -- it looks like XUnitLogChecker is ignoring the test run exit code

BruceForstall commented 1 year ago

Did you do anything related to BuildAsStandalone? I think that may stop the merged test wrappers from being built.

Good point. I always build with BuildAsStandalone=true

BruceForstall commented 1 year ago

I opened https://github.com/dotnet/runtime/issues/85056 to track the CI infrastructure problem (of not reporting this test failure)

markples commented 1 year ago

This is the case where the failure takes down the entire process. Presumably the test that you're looking for didn't run. We don't have the wrapper logic to attempt to rerun the rest of the test group after the failure. Mitigations are (1) Disable (or mark as RequiresProcessIsolation) catastrophic failures asap to unblock other testing, and (2) keep a balanced merged test group size (# number of 100s) to keep the wins from merging but reduce the collateral damage from these failures.

I see the logic in src/tests/Directory.Build.targets that disables merged test groups under BuildAsStandalone. It's possible that this isn't necessary. Similar to RequiresProcessIsolation, this generates a .cmd/.sh/local entry point, but a merged group can still execute it out-of-process. I guess often it wouldn't be useful since you can run the test standalone.

tannergooding commented 1 year ago

This is the case where the failure takes down the entire process

Shouldn't the harness try to detect this and mark itself as failing? It looks like we do capture the test process having a bad exit code and that we didn't run everything...

markples commented 1 year ago

There's an issue where one level needs to report at "success" to avoid helix retrying the job. This isn't an area that I understand - those you've already tagged (plus maybe @jkoritzinsky?) should know more.

BruceForstall commented 1 year ago

Presumably the test that you're looking for didn't run

@markples There are multiple things going on here, but the core problem is the important test here, that took down the merged group, did not get reported as failing (and in fact the entire group did not get reported as failing).

ivdiazsa commented 1 year ago

Bruce is right. It's a bug with the log checker I'm already looking into. I have no idea why it's overriding the test's exit code. It might have to do with the contents of the HelixCommand script, as the log checker is the last thing to get called.

dotnet / runtime

Coredistools failure to disassemble vextracti64x4 #84967