dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.06k stars 4.69k forks source link

Crossgen outerloop regression: "Failed to convert invalid RVA to offset: <N>" #108255

Open kg opened 1 week ago

kg commented 1 week ago

On linux and osx hosts, crossgen r2rdump fails for a subset of tests right now fairly consistently, where the RVA of the auxiliary data points to the exact end of the .text section's virtual address space. There should be 4 bytes there, representing the length of the auxiliary data, or a NULL to indicate there is no auxiliary data. This does not appear to fail on windows, which seems to align with the comment in R2RPEBuilder.cs that suggests it doesn't patch any section headers on Windows:

        /// <summary>
        /// On Linux, we must patch the section headers. This is because the CoreCLR runtime on Linux
        /// requires the 12-16 low-order bits of section RVAs (the number of bits corresponds to the page
        /// size) to be identical to the file offset, otherwise memory mapping of the file fails.
        /// Sadly PEBuilder in System.Reflection.Metadata doesn't support this so we must post-process
        /// the EXE by patching section headers with the correct RVA's. To reduce code variations
        /// we're performing the same transformation on Windows where it is a no-op.
        /// </summary>
        /// <param name="outputStream"></param>
        private void UpdateSectionRVAs(Stream outputStream)

Example with additional diagnostic data from PR https://github.com/dotnet/runtime/pull/106099/:

Running CrossGen2:  /tmp/helix/working/A5660909/p/crossgen2/crossgen2 @/private/tmp/helix/working/A5660909/w/B1E409AA/e/JIT/Methodical/xxobj/sizeof/sizeof32_Target_64Bit_and_arm_d/composite-r2r.dll.rsp   --composite
Emitting R2R PE file: /private/tmp/helix/working/A5660909/w/B1E409AA/e/JIT/Methodical/xxobj/sizeof/sizeof32_Target_64Bit_and_arm_d/composite-r2r.dll
Emitting R2R PE file: /private/tmp/helix/working/A5660909/w/B1E409AA/e/JIT/Methodical/xxobj/sizeof/sizeof32_Target_64Bit_and_arm_d/sizeof32_Target_64Bit_and_arm_d.dll
Running R2RDump:  dotnet /tmp/helix/working/A5660909/p/R2RDump/R2RDump.dll --header --sc --in /private/tmp/helix/working/A5660909/w/B1E409AA/e/JIT/Methodical/xxobj/sizeof/sizeof32_Target_64Bit_and_arm_d/composite-r2r.dll --out /private/tmp/helix/working/A5660909/w/B1E409AA/e/JIT/Methodical/xxobj/sizeof/sizeof32_Target_64Bit_and_arm_d/composite-r2r.dll.r2rdump --val
Error: System.BadImageFormatException: Failed to convert invalid RVA to offset: 67192.
Sections:
.text PTRD=1024 SORD=1024 VA=66560 VS=632
.data PTRD=2048 SORD=1024 VA=198656 VS=1014
.edata PTRD=3072 SORD=512 VA=330752 VS=84
.reloc PTRD=3584 SORD=512 VA=462336 VS=12

at ILCompiler.Reflection.ReadyToRun.PEReaderExtensions.GetOffset(PEReader reader, Int32 rva) in //src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/PEReaderExtensions.cs:line 116
at ILCompiler.Reflection.ReadyToRun.ReadyToRunReader.EnsureImportSections() in //src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/ReadyToRunReader.cs:line 1456
at R2RDump.TextDumper.DumpSectionContents(ReadyToRunSection section) in //src/coreclr/tools/r2rdump/TextDumper.cs:line 435
at R2RDump.TextDumper.DumpSection(ReadyToRunSection section) in //src/coreclr/tools/r2rdump/TextDumper.cs:line 120
at R2RDump.TextDumper.DumpHeader(Boolean dumpSections) in //src/coreclr/tools/r2rdump/TextDumper.cs:line 81
at R2RDump.Program.Dump(ReadyToRunReader r2r) in //src/coreclr/tools/r2rdump/Program.cs:line 166
at R2RDump.Program.Run() in /_/src/coreclr/tools/r2rdump/Program.cs:line 460

The artifacts for these failed runs don't appear to contain the actual binaries, which might be related to this:

cp: /tmp/helix/working/A5660909/p/libsuperpmi-shim-simple.dylib is not a directory
DOTNET_DbgEnableMiniDump is set and the createdump binary does not exist: /private/tmp/helix/working/A5660909/p/crossgen2/createdump
kg commented 5 days ago

The problem appears to be that the composite-r2r.dll generated on linux is corrupt or truncated; it's 4096 bytes while on Windows it's 144KB. The map files generated show that the Windows one is full of the things you'd expect, like 2183 symbols and 499 relocations, while the linux one has 56 symbols and 0 relocations.

So I think R2RDump is correct here and the binary being dumped is just corrupt.

kg commented 2 days ago

My repro steps on linux (won't break on windows):

./build.sh Clr+Tools+Libs -c Release -rc Debug --os linux
./src/tests/build.sh debug -priority1 -crossgen2 -p:LibrariesConfiguration=Release -tree:JIT/Methodical/xxobj/sizeof/
PATH=/home/kate/Projects/dotnet-runtime-wasm/.dotnet/:$PATH RunCrossGen2=1 CompositeBuildMode=1 artifacts/tests/coreclr/linux.x64.Debug/JIT/Methodical/xxobj/sizeof/sizeof64_Target_64Bit_and_arm_r/sizeof64_Target_64Bit_and_arm_r.sh
kg commented 2 days ago

The truncation isn't caused by the not-on-windows section alignment logic. Comparison of the map files with it enabled/disabled:

disabled

INDEX | FILEOFFSET | RVA        | END_RVA    | LENGTH     | NAME
----------------------------------------------------------------
    0 | 0x00000400 | 0x00000400 | 0x00000696 | 0x00000296 | .text
    1 | 0x00000800 | 0x00000800 | 0x00000BD8 | 0x000003D8 | .data
    2 | 0x00000000 | 0x00000000 | 0x00000000 | 0x00000000 | .edata
    3 | 0x00000E00 | 0x00000E00 | 0x00000E01 | 0x00000001 | .reloc

enabled

INDEX | FILEOFFSET | RVA        | END_RVA    | LENGTH     | NAME
----------------------------------------------------------------
    0 | 0x00000400 | 0x00010400 | 0x00010696 | 0x00000296 | .text
    1 | 0x00000800 | 0x00030800 | 0x00030BD8 | 0x000003D8 | .data
    2 | 0x00000000 | 0x00000000 | 0x00000000 | 0x00000000 | .edata
    3 | 0x00000E00 | 0x00070E00 | 0x00070E01 | 0x00000001 | .reloc

So disabling the section alignment doesn't fix the RVA being incorrect (it still points at the end of .text).

kg commented 1 day ago

Anyone trying to bisect this in old previews (it appears to be broken as far back as net8.0 preview 2) will need to patch SimpleRwLock.hpp to remove the #ifndef DACCESS_COMPILE block that's hiding some methods our code needs to build via the repro steps. You'll also potentially need to do set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -std=gnu99 -Wno-declaration-after-statement -Wno-pre-c11-compat") in src/native/libs/CMakeLists.txt if your clang is new enough, and cherry-pick commit 487d7f0

(EDITED)

kg commented 1 day ago

Passing --dgmllog to crossgen when it creates the r2r image(s) causes the output to pass R2Rdump successfully, but the actual test fails sometimes afterwards:

/home/kate/Projects/dotnet-runtime-wasm/artifacts/tests/coreclr/linux.x64.Debug/Tests/Core_Root/corerun -p System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization=true sizeof64_Target_64Bit_and_arm_r.dll ''
Unhandled exception. System.BadImageFormatException: An attempt was made to load a program with an incorrect format.
 (0x8007000B (COR_E_BADIMAGEFORMAT))
artifacts/tests/coreclr/linux.x64.Debug/JIT/Methodical/xxobj/sizeof/sizeof64_Target_64Bit_and_arm_r/sizeof64_Target_64Bit_and_arm_r.sh: line 442: 1414533 Aborted                 (core dumped) $LAUNCHER $ExePath "${CLRTestExecutionArguments[@]}"