dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.26k stars 4.73k forks source link

.NET 7 osx-arm64 single-file crashing with sigsegv #67062

Closed am11 closed 2 years ago

am11 commented 2 years ago

Description

Latest build of .NET 7 published single-file app is crashing on execution.

Reproduction Steps

# installation
mkdir ~/.dotnet7
curl -sSL https://aka.ms/dotnet/7.0.1xx/daily/dotnet-sdk-osx-arm64.tar.gz | tar xzf - -C ~/.dotnet7

# publish a new app as self-contained and single app
~/.dotnet7/dotnet new console -n testapp1
cd testapp1
cat > NuGet.config << EOF
<configuration>
  <packageSources>
    <add key="dotnet7" value="https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet7/nuget/v3/index.json" />
  </packageSources>
</configuration>
EOF
~/.dotnet7/dotnet publish --use-current-runtime -p:PublishSingleFile=true --self-contained -c Release

# run the published app
bin/Release/net7.0/osx-arm64/publish/testapp1

Expected behavior

Displays Hello, World!

Actual behavior

zsh: segmentation fault  bin/Release/net7.0/osx-arm64/publish/testapp1

Regression?

Yes, it woks with .NET 6.

Known Workarounds

Publish as self-contained, without -p:PublishSingleFile=true.

Configuration

Daily build

% strings ~/.dotnet7/dotnet | grep '@(#)'
@(#)Version 7.0.22.17106 @Commit: ce813882f4061459dc62b63acb75add040f1f603

Other information

I tried debugging it with native symbols (of release singlefilehost), the clrstack looks like this:

% lldb bin/Release/net7.0/osx-arm64/publish/testapp1
Added Microsoft public symbol server

(lldb) target create "bin/Release/net7.0/osx-arm64/publish/testapp1"
Current executable set to '/Users/am11/projects/testapp1/bin/Release/net7.0/osx-arm64/publish/testapp1' (arm64).

(lldb) r
Process 22685 launched: '/Users/am11/projects/testapp1/bin/Release/net7.0/osx-arm64/publish/testapp1' (arm64)
Process 22685 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00000001000b4c78 testapp1`DictionaryLayout::FindToken(MethodTable*, LoaderAllocator*, int, SigBuilder*, unsigned char*, DictionaryEntrySignatureSource, CORINFO_RUNTIME_LOOKUP*, unsigned short*) + 84
testapp1`DictionaryLayout::FindToken:
->  0x1000b4c78 <+84>: ldr    w8, [x22]
    0x1000b4c7c <+88>: tst    w8, #0x30
    0x1000b4c80 <+92>: cset   w9, eq
    0x1000b4c84 <+96>: orr    w8, w9, w8, lsr #31
Target 0: (testapp1) stopped.

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00000001000b4c78 testapp1`DictionaryLayout::FindToken(MethodTable*, LoaderAllocator*, int, SigBuilder*, unsigned char*, DictionaryEntrySignatureSource, CORINFO_RUNTIME_LOOKUP*, unsigned short*) + 84
    frame #1: 0x000000010010bf00 testapp1`ProcessDynamicDictionaryLookup(TransitionBlock*, Module*, Module*, unsigned char, unsigned char const*, unsigned char const*, CORINFO_RUNTIME_LOOKUP*, unsigned int*) + 932
    frame #2: 0x000000010010c290 testapp1`DynamicHelperFixup(TransitionBlock*, unsigned long*, unsigned int, Module*, CORCOMPILE_FIXUP_BLOB_KIND*, TypeHandle*, MethodDesc**, FieldDesc**) + 408
    frame #3: 0x000000010010d2d0 testapp1`DynamicHelperWorker + 232
    frame #4: 0x00000001002ed34c testapp1`DelayLoad_Helper_FakeProlog + 92
    frame #5: 0x0000000176a93760
    frame #6: 0x0000000176aa86b0
    frame #7: 0x00000001766badc4
    frame #8: 0x00000001002ed830 testapp1`CallDescrWorkerInternal + 132
    frame #9: 0x0000000100162eb4 testapp1`MethodDescCallSite::CallTargetWorker(unsigned long const*, unsigned long*, int) + 852
    frame #10: 0x000000010008df44 testapp1`CorHost2::CreateAppDomainWithManager(char16_t const*, unsigned int, char16_t const*, char16_t const*, int, char16_t const**, char16_t const**, unsigned int*) + 620
    frame #11: 0x0000000100572334 testapp1`coreclr_initialize + 784
    frame #12: 0x000000010001fb70 testapp1`coreclr_t::create(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, char const*, char const*, coreclr_property_bag_t const&, std::__1::unique_ptr<coreclr_t, std::__1::default_delete<coreclr_t> >&) + 420
    frame #13: 0x000000010002c998 testapp1`(anonymous namespace)::create_coreclr() + 432
    frame #14: 0x000000010002c46c testapp1`corehost_main + 160
    frame #15: 0x000000010000d5c8 testapp1`fx_muxer_t::handle_exec_host_command(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, host_startup_info_t const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::unordered_map<known_options, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >, known_options_hash, std::__1::equal_to<known_options>, std::__1::allocator<std::__1::pair<known_options const, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > > > const&, int, char const**, int, host_mode_t, bool, char*, int, int*) + 1328
    frame #16: 0x000000010000c6a4 testapp1`fx_muxer_t::execute(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int, char const**, host_startup_info_t const&, char*, int, int*) + 860
    frame #17: 0x00000001000091c0 testapp1`hostfxr_main_bundle_startupinfo + 196
    frame #18: 0x000000010004c818 testapp1`exe_start(int, char const**) + 1124
    frame #19: 0x000000010004caf4 testapp1`main + 152
    frame #20: 0x00000001043610f4 dyld`start + 520

(lldb) clrstack -f
OS Thread Id: 0x30e100 (1)
        Child SP               IP Call Site
000000016FDFE1A0 00000001000B4C78 testapp1!DictionaryLayout::FindToken(MethodTable*, LoaderAllocator*, int, SigBuilder*, unsigned char*, DictionaryEntrySignatureSource, CORINFO_RUNTIME_LOOKUP*, unsigned short*) + 84
000000016FDFE230 000000010010BF00 testapp1!ProcessDynamicDictionaryLookup(TransitionBlock*, Module*, Module*, unsigned char, unsigned char const*, unsigned char const*, CORINFO_RUNTIME_LOOKUP*, unsigned int*) + 932
000000016FDFE290 000000010010C290 testapp1!DynamicHelperFixup(TransitionBlock*, unsigned long*, unsigned int, Module*, CORCOMPILE_FIXUP_BLOB_KIND*, TypeHandle*, MethodDesc**, FieldDesc**) + 408
000000016FDFE610 000000010010D2D0 testapp1!DynamicHelperWorker + 232
000000016FDFE6A0                  [DynamicHelperFrame: 000000016fdfe6a0] 
000000016FDFE730 00000001002ED34C testapp1!DelayLoad_Helper_FakeProlog + 92
000000016FDFE860 0000000176AC3760 System.Private.CoreLib.dll!System.Collections.Generic.HashSet`1[[System.__Canon, System.Private.CoreLib]].CheckUniqueAndUnfoundElements(System.Collections.Generic.IEnumerable`1<System.__Canon>, Boolean) + 112 [/_/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSet.cs @ 1436]
000000016FDFE910 0000000176AD86B0 System.Private.CoreLib.dll!System.Collections.Generic.Dictionary`2[[System.__Canon, System.Private.CoreLib],[System.IntPtr, System.Private.CoreLib]].TryGetValue(System.__Canon, IntPtr ByRef) + 32 [/_/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/Dictionary.cs @ 1108]
000000016FDFE930 00000001766EADC4 System.Private.CoreLib.dll!System.AppContext.Setup(Char**, Char**, Int32) + 84 [/_/src/libraries/System.Private.CoreLib/src/System/AppContext.cs @ 136]
FFFFFFFFFFFFFFFF 0000000176AD86B0 
FFFFFFFFFFFFFFFF 00000001766EADC4 
FFFFFFFFFFFFFFFF 00000001002ED830 testapp1!CallDescrWorkerInternal + 132
000000016FDFE9B0 0000000100162EB4 testapp1!MethodDescCallSite::CallTargetWorker(unsigned long const*, unsigned long*, int) + 852
000000016FDFEC20 000000010008DF44 testapp1!CorHost2::CreateAppDomainWithManager(char16_t const*, unsigned int, char16_t const*, char16_t const*, int, char16_t const**, char16_t const**, unsigned int*) + 620
000000016FDFEE20 0000000100572334 testapp1!coreclr_initialize + 784
000000016FDFEEE0 000000010001FB70 testapp1!coreclr_t::create(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, char const*, char const*, coreclr_property_bag_t const&, std::__1::unique_ptr<coreclr_t, std::__1::default_delete<coreclr_t> >&) + 420
000000016FDFEFF0 000000010002C998 testapp1!(anonymous namespace)::create_coreclr() + 432
000000016FDFF060 000000010002C46C testapp1!corehost_main + 160
000000016FDFF1B0 000000010000D5C8 testapp1!fx_muxer_t::handle_exec_host_command(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, host_startup_info_t const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::unordered_map<known_options, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >, known_options_hash, std::__1::equal_to<known_options>, std::__1::allocator<std::__1::pair<known_options const, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > > > const&, int, char const**, int, host_mode_t, bool, char*, int, int*) + 1328
000000016FDFF310 000000010000C6A4 testapp1!fx_muxer_t::execute(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int, char const**, host_startup_info_t const&, char*, int, int*) + 860
000000016FDFF420 00000001000091C0 testapp1!hostfxr_main_bundle_startupinfo + 196
000000016FDFF4D0 000000010004C818 testapp1!exe_start(int, char const**) + 1124
000000016FDFF600 000000010004CAF4 testapp1!main + 152
000000016FDFF660 00000001043610F4 dyld!start + 520
ghost commented 2 years ago

Tagging subscribers to this area: @agocke, @vitek-karas, @vsadov See info in area-owners.md if you want to be subscribed.

Issue Details
### Description Latest build of .NET 7 published single-file app is crashing on execution. ### Reproduction Steps ```sh # installation mkdir ~/.dotnet7 curl -sSL https://aka.ms/dotnet/7.0.1xx/daily/dotnet-sdk-osx-arm64.tar.gz | tar xzf - -C ~/.dotnet7 # publish a new app as self-contained and single app ~/.dotnet7/dotnet new console -n testapp1 cd testapp1 cat > NuGet.config << EOF EOF ~/.dotnet7/dotnet publish --use-current-runtime -p:PublishSingleFile=true --self-contained -c Release # run the published app bin/Release/net7.0/osx-arm64/publish/testapp1 ``` ### Expected behavior Displays `Hello, World!` ### Actual behavior ```zsh zsh: segmentation fault bin/Release/net7.0/osx-arm64/publish/testapp1 ``` ### Regression? Yes, it woks with .NET 6. ### Known Workarounds Publish as self-contained, without `-p:PublishSingleFile=true `. ### Configuration Daily build ```zsh % strings ~/.dotnet7/dotnet | grep '@(#)' @(#)Version 7.0.22.17106 @Commit: ce813882f4061459dc62b63acb75add040f1f603 ``` ### Other information I tried debugging it with native symbols (of release singlefilehost), the clrstack looks like this: ```gdb % lldb bin/Release/net7.0/osx-arm64/publish/testapp1 Added Microsoft public symbol server (lldb) target create "bin/Release/net7.0/osx-arm64/publish/testapp1" Current executable set to '/Users/am11/projects/testapp1/bin/Release/net7.0/osx-arm64/publish/testapp1' (arm64). (lldb) r Process 22685 launched: '/Users/am11/projects/testapp1/bin/Release/net7.0/osx-arm64/publish/testapp1' (arm64) Process 22685 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x00000001000b4c78 testapp1`DictionaryLayout::FindToken(MethodTable*, LoaderAllocator*, int, SigBuilder*, unsigned char*, DictionaryEntrySignatureSource, CORINFO_RUNTIME_LOOKUP*, unsigned short*) + 84 testapp1`DictionaryLayout::FindToken: -> 0x1000b4c78 <+84>: ldr w8, [x22] 0x1000b4c7c <+88>: tst w8, #0x30 0x1000b4c80 <+92>: cset w9, eq 0x1000b4c84 <+96>: orr w8, w9, w8, lsr #31 Target 0: (testapp1) stopped. (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) * frame #0: 0x00000001000b4c78 testapp1`DictionaryLayout::FindToken(MethodTable*, LoaderAllocator*, int, SigBuilder*, unsigned char*, DictionaryEntrySignatureSource, CORINFO_RUNTIME_LOOKUP*, unsigned short*) + 84 frame #1: 0x000000010010bf00 testapp1`ProcessDynamicDictionaryLookup(TransitionBlock*, Module*, Module*, unsigned char, unsigned char const*, unsigned char const*, CORINFO_RUNTIME_LOOKUP*, unsigned int*) + 932 frame #2: 0x000000010010c290 testapp1`DynamicHelperFixup(TransitionBlock*, unsigned long*, unsigned int, Module*, CORCOMPILE_FIXUP_BLOB_KIND*, TypeHandle*, MethodDesc**, FieldDesc**) + 408 frame #3: 0x000000010010d2d0 testapp1`DynamicHelperWorker + 232 frame #4: 0x00000001002ed34c testapp1`DelayLoad_Helper_FakeProlog + 92 frame #5: 0x0000000176a93760 frame #6: 0x0000000176aa86b0 frame #7: 0x00000001766badc4 frame #8: 0x00000001002ed830 testapp1`CallDescrWorkerInternal + 132 frame #9: 0x0000000100162eb4 testapp1`MethodDescCallSite::CallTargetWorker(unsigned long const*, unsigned long*, int) + 852 frame #10: 0x000000010008df44 testapp1`CorHost2::CreateAppDomainWithManager(char16_t const*, unsigned int, char16_t const*, char16_t const*, int, char16_t const**, char16_t const**, unsigned int*) + 620 frame #11: 0x0000000100572334 testapp1`coreclr_initialize + 784 frame #12: 0x000000010001fb70 testapp1`coreclr_t::create(std::__1::basic_string, std::__1::allocator > const&, char const*, char const*, coreclr_property_bag_t const&, std::__1::unique_ptr >&) + 420 frame #13: 0x000000010002c998 testapp1`(anonymous namespace)::create_coreclr() + 432 frame #14: 0x000000010002c46c testapp1`corehost_main + 160 frame #15: 0x000000010000d5c8 testapp1`fx_muxer_t::handle_exec_host_command(std::__1::basic_string, std::__1::allocator > const&, host_startup_info_t const&, std::__1::basic_string, std::__1::allocator > const&, std::__1::unordered_map, std::__1::allocator >, std::__1::allocator, std::__1::allocator > > >, known_options_hash, std::__1::equal_to, std::__1::allocator, std::__1::allocator >, std::__1::allocator, std::__1::allocator > > > > > > const&, int, char const**, int, host_mode_t, bool, char*, int, int*) + 1328 frame #16: 0x000000010000c6a4 testapp1`fx_muxer_t::execute(std::__1::basic_string, std::__1::allocator >, int, char const**, host_startup_info_t const&, char*, int, int*) + 860 frame #17: 0x00000001000091c0 testapp1`hostfxr_main_bundle_startupinfo + 196 frame #18: 0x000000010004c818 testapp1`exe_start(int, char const**) + 1124 frame #19: 0x000000010004caf4 testapp1`main + 152 frame #20: 0x00000001043610f4 dyld`start + 520 (lldb) clrstack -f OS Thread Id: 0x30e100 (1) Child SP IP Call Site 000000016FDFE1A0 00000001000B4C78 testapp1!DictionaryLayout::FindToken(MethodTable*, LoaderAllocator*, int, SigBuilder*, unsigned char*, DictionaryEntrySignatureSource, CORINFO_RUNTIME_LOOKUP*, unsigned short*) + 84 000000016FDFE230 000000010010BF00 testapp1!ProcessDynamicDictionaryLookup(TransitionBlock*, Module*, Module*, unsigned char, unsigned char const*, unsigned char const*, CORINFO_RUNTIME_LOOKUP*, unsigned int*) + 932 000000016FDFE290 000000010010C290 testapp1!DynamicHelperFixup(TransitionBlock*, unsigned long*, unsigned int, Module*, CORCOMPILE_FIXUP_BLOB_KIND*, TypeHandle*, MethodDesc**, FieldDesc**) + 408 000000016FDFE610 000000010010D2D0 testapp1!DynamicHelperWorker + 232 000000016FDFE6A0 [DynamicHelperFrame: 000000016fdfe6a0] 000000016FDFE730 00000001002ED34C testapp1!DelayLoad_Helper_FakeProlog + 92 000000016FDFE860 0000000176AC3760 System.Private.CoreLib.dll!System.Collections.Generic.HashSet`1[[System.__Canon, System.Private.CoreLib]].CheckUniqueAndUnfoundElements(System.Collections.Generic.IEnumerable`1, Boolean) + 112 [/_/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSet.cs @ 1436] 000000016FDFE910 0000000176AD86B0 System.Private.CoreLib.dll!System.Collections.Generic.Dictionary`2[[System.__Canon, System.Private.CoreLib],[System.IntPtr, System.Private.CoreLib]].TryGetValue(System.__Canon, IntPtr ByRef) + 32 [/_/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/Dictionary.cs @ 1108] 000000016FDFE930 00000001766EADC4 System.Private.CoreLib.dll!System.AppContext.Setup(Char**, Char**, Int32) + 84 [/_/src/libraries/System.Private.CoreLib/src/System/AppContext.cs @ 136] FFFFFFFFFFFFFFFF 0000000176AD86B0 FFFFFFFFFFFFFFFF 00000001766EADC4 FFFFFFFFFFFFFFFF 00000001002ED830 testapp1!CallDescrWorkerInternal + 132 000000016FDFE9B0 0000000100162EB4 testapp1!MethodDescCallSite::CallTargetWorker(unsigned long const*, unsigned long*, int) + 852 000000016FDFEC20 000000010008DF44 testapp1!CorHost2::CreateAppDomainWithManager(char16_t const*, unsigned int, char16_t const*, char16_t const*, int, char16_t const**, char16_t const**, unsigned int*) + 620 000000016FDFEE20 0000000100572334 testapp1!coreclr_initialize + 784 000000016FDFEEE0 000000010001FB70 testapp1!coreclr_t::create(std::__1::basic_string, std::__1::allocator > const&, char const*, char const*, coreclr_property_bag_t const&, std::__1::unique_ptr >&) + 420 000000016FDFEFF0 000000010002C998 testapp1!(anonymous namespace)::create_coreclr() + 432 000000016FDFF060 000000010002C46C testapp1!corehost_main + 160 000000016FDFF1B0 000000010000D5C8 testapp1!fx_muxer_t::handle_exec_host_command(std::__1::basic_string, std::__1::allocator > const&, host_startup_info_t const&, std::__1::basic_string, std::__1::allocator > const&, std::__1::unordered_map, std::__1::allocator >, std::__1::allocator, std::__1::allocator > > >, known_options_hash, std::__1::equal_to, std::__1::allocator, std::__1::allocator >, std::__1::allocator, std::__1::allocator > > > > > > const&, int, char const**, int, host_mode_t, bool, char*, int, int*) + 1328 000000016FDFF310 000000010000C6A4 testapp1!fx_muxer_t::execute(std::__1::basic_string, std::__1::allocator >, int, char const**, host_startup_info_t const&, char*, int, int*) + 860 000000016FDFF420 00000001000091C0 testapp1!hostfxr_main_bundle_startupinfo + 196 000000016FDFF4D0 000000010004C818 testapp1!exe_start(int, char const**) + 1124 000000016FDFF600 000000010004CAF4 testapp1!main + 152 000000016FDFF660 00000001043610F4 dyld!start + 520 ```
Author: am11
Assignees: -
Labels: `arch-arm64`, `os-mac-os-x`, `area-Single-File`
Milestone: -
am11 commented 2 years ago

It broke in main branch on Nov 3, 2021.

@jkoritzinsky, I have bisected the commits and found that the first commit (since .NET 6 release) which fails single-file app on osx-arm64 is 24e7a4a1a101d91b6666dc6f44137574246fdd9c (it was working until the previous commit c87e932d2b38b2929a8b1deb798682a3b122aa85). With debug build, it fails an assertion:

Assert failure(PID 70129 [0x000111f1], Thread: 5669656 [0x568318]): Consistency check failed: System.Environment::GetProcessorCount is not registered using DllImportentry macro in qcallentrypoints.cppFAILED: pvTarget != nullptr
    File: /Users/am11/projects/runtime-pr/src/coreclr/vm/dllimport.cpp Line: 5449
    Image: /Users/am11/projects/testapp1/bin/Debug/net7.0/osx-arm64/publish/testapp1

zsh: abort      bin/Debug/net7.0/osx-arm64/publish/testapp1

I have debugged a bit and noticed that after this line (which does not fail): https://github.com/dotnet/runtime/blob/24e7a4a1a101d91b6666dc6f44137574246fdd9c/src/coreclr/vm/dllimport.cpp#L2750

p *ppEntryPointName in lldb prints GetProcessorCount instead of Environment_GetProcessorCount. Any thoughts (or theories) what might be the cause of invalid mapping? 🤔

am11 commented 2 years ago

I have ran another git-bisect session, this time marking ProcessorCount error with git bisect good (basically ignoring it). Here is a more precise summary:

  1. from release/6.0 branch-off commit until https://github.com/dotnet/runtime/commit/24e7a4a1a101d91b6666dc6f44137574246fdd9c ~1, everything was fine. That commit started to fail QCall consistency check.


    • Assert failure(PID 41507 [0x0000a223], Thread: 6600605 [0x64b79d]): Consistency check failed: System.Environment::GetProcessorCount is not registered using DllImportentry macro in qcallentrypoints.cppFAILED: pvTarget != nullptr
          File: /Users/am11/projects/runtime-pr/src/coreclr/vm/dllimport.cpp Line: 5436
          Image: /Users/am11/projects/testapp1/bin/Debug/net7.0/osx-arm64/publish/testapp1
    • cc @jkoritzinsky
  2. from 24e7a4a1a101d91b6666dc6f44137574246fdd9c until bcd35278ca879554ed98e522c007dc0025a19303 ~1, the same consistency check was failing. With the latter commit, a different assertion has started to fail earlier in the execution. This is the case in the tip of main branch.


    • Assert failure(PID 26205 [0x0000665d], Thread: 6557904 [0x6410d0]): Compiler optimization assumption invalid: EE expects method to exist: System.String:Ctor  Sig pointer: 0000000105317690
      FAILED: pMD != 0
        File: /Users/am11/projects/runtime-pr/src/coreclr/vm/binder.cpp Line: 125
        Image: /Users/am11/projects/testapp1/bin/Debug/net7.0/osx-arm64/publish/testapp1
    • cc @jkotas

If they are not related in terms of root-cause, then fixing 2 first will bring it back to state of 1.

am11 commented 2 years ago

@jkotas, (I can create a separate issue for 2 if needed) it looks like the issue is with the meta signature of METHOD__STRING__CTORF_CHARARRAY that has first byte set to 0 but the one computed by MethodDesc::GetSigFromMetadata has value 32 (which is probably incorrect?). Consequently, this comparison is failing: https://github.com/dotnet/runtime/blob/9b3b937eb364dda4f91b6b5288c83f4e4f45e7e3/src/coreclr/vm/siginfo.cpp#L4281

* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 10.1
    frame #0: 0x00000001002a8a10 testapp1`MetaSig::CompareMethodSigs(pSignature1="", cSig1=5, pModule1=0x00000001764c0000, pSubst1=0x0000000000000000, pSignature2=" \U00000001\U0000000e\U0000001d\U00000003\a \U00000003\U00000001\U0000001d\U00000003\b\b2\U00000001", cSig2=5, pModule2=0x00000001764c0000, pSubst2=0x0000000000000000, skipReturnTypeSig=NO, pVisited=0x0000000000000000) at siginfo.cpp:4281:17
   4278         (cSig1 == cSig2) &&
   4279         (pSubst1 == NULL) &&
   4280         (pSubst2 == NULL) &&
-> 4281         (memcmp(pSig1, pSig2, cSig1) == 0))
   4282     {
   4283         return TRUE;
   4284     }
Target 0: (testapp1) stopped.

(lldb) p (int)memcmp(pSig1, pSig2, cSig1)
(int) $300 = -32

(lldb) p cSig1
(DWORD) $301 = 5

(lldb) memory read -s1 -fu -c5 pSig1 --force
0x100e0429e: 0
0x100e0429f: 1
0x100e042a0: 14
0x100e042a1: 29
0x100e042a2: 3

(lldb) memory read -s1 -fu -c5 pSig2 --force
0x108684d18: 32
0x108684d19: 1
0x108684d1a: 14
0x108684d1b: 29
0x108684d1c: 3

if i jump the PC to line 4283 and continue, the same 32 vs. 0 issue shows up for other string methods. For the non-string methods (like METHOD__CASTHELPERS__ISINSTANCEOFANY, METHOD__CASTHELPERS__UNBOX etc.), the comparison succeeds because both pSig1 and pSig2 have 0 in the first byte.

jkotas commented 2 years ago

Neither of the two failure modes make sense. I think that the problem is likely a bad C++ codegen or something low-level like that.

VSadov commented 2 years ago

p *ppEntryPointName in lldb prints GetProcessorCount instead of Environment_GetProcessorCount. Any thoughts (or theories) what might be the cause of invalid mapping? 🤔

Maybe mismatching bits - like a new singlefilehost and old System.Private.CoreLib.dll It would be hard to mismatch them though, since we build them together.

jkotas commented 2 years ago

Yeah, I agree. This looks like mismatched bits.

am11 commented 2 years ago

@VSadov will it be fixed in the next preview?

VSadov commented 2 years ago

When I am trying the scenario with latest daily build, it looks like bits are matching but R2R is broken.

It looks like R2R is broken in singlefile on OSX. It is also likely that we are not running host tests on osx-arm64

BTW, when targeting osx-x64, the app runs on the same machine (M1)

I will continue investigating.

VSadov commented 2 years ago

the build that I picked up is:

strings ./testapp1 | grep @Commit                                                                    

@(#)Version 7.0.22.22403 @Commit: 47d9c43ab1f10a98a348a28b3fd7ed9c4d35328b
am11 commented 2 years ago

It is also likely that we are not running host tests on osx-arm64

Single file tests were added to outerloop test pipeline in https://github.com/dotnet/runtime/commit/7677f7dc71fafad1f35639803b86d05b0bd7df72, and removed in https://github.com/dotnet/runtime/commit/f29ba20bec327dc18013abd0a867ab3a95448a73#diff-e2e027b9777fc35f4a8243db97ce50f7dac99b3cee9465c5325d283c34d2d872L655 for cost saving.

I think those are good tests to validate with frequent runtime changes and we should bring them back with osx-arm64 addition. AFAIK, there is nothing else in any pipeline testing single-file host (in runtime, sdk or installer repos). Issues are reported usually after the GA release.

vitek-karas commented 2 years ago

I "think" we have an E2E test in the SDK repo (didn't check to be sure) - unfortunately I know that SDK or installer repo doesn't run tests on osx-arm64 either.

VSadov commented 2 years ago

it looks like we sometimes see PE sections overlapping in memory. This is either a loader bug or crossgen bug. Most likely crossgen. Either way we should be able to layout a PE that we ourselves produce.

oransel commented 2 years ago

Same error with dotnet 6.0 on M1

thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x580000ead2800051) frame #0: 0x00000001000b9c58 testDictionaryLayout::FindToken(MethodTable*, LoaderAllocator*, int, SigBuilder*, unsigned char*, DictionaryEntrySignatureSource, CORINFO_RUNTIME_LOOKUP*, unsigned short*) + 140 testDictionaryLayout::FindToken: -> 0x1000b9c58 <+140>: ldr x9, [x9, #0x8] 0x1000b9c5c <+144>: cbz x9, 0x1000b9c70 ; <+164> 0x1000b9c60 <+148>: ldr x12, [x9] 0x1000b9c64 <+152>: ldrh w9, [x12] Target 0: (test) stopped.

Fix:

export COMPlus_ZapDisable=1

am11 commented 2 years ago

Pretty sure it was working fine with .NET 6 in March, without disabling zap. It is perhaps a recent regression? I haven't tested with latest patch version.

oransel commented 2 years ago

Here are the outputs:

→ dotnet --version 6.0.300

→ uname -a Darwin MBProMax.local 21.5.0 Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:37 PDT 2022; root:xnu-8020.121.3~4/RELEASE_ARM64_T6000 arm64

→ cat Program.cs // See https://aka.ms/new-console-template for more information var log = (object msg) => Console.WriteLine((new DateTimeOffset(DateTime.UtcNow).ToUnixTimeSeconds()).ToString() + ": " + msg);

log("Hello, World!");

→ dotnet publish --use-current-runtime -p:PublishSingleFile=true --self-contained -c Release Microsoft (R) Build Engine version 17.2.0+41abc5629 for .NET Copyright (C) Microsoft Corporation. All rights reserved.

Determining projects to restore... Restored /private/tmp/test/test.csproj (in 79 ms). test -> /private/tmp/test/bin/Release/net6.0/osx-arm64/test.dll Optimizing assemblies for size, which may change the behavior of the app. Be sure to test after publishing. See: https://aka.ms/dotnet-illink test -> /private/tmp/test/bin/Release/net6.0/osx-arm64/publish/

→ /private/tmp/test/bin/Release/net6.0/osx-arm64/publish/test zsh: segmentation fault /private/tmp/test/bin/Release/net6.0/osx-arm64/publish/test

oransel commented 2 years ago

@am11 can we re-open this for v6?

VSadov commented 2 years ago

There is a separate issue for 6.0 - https://github.com/dotnet/runtime/issues/69923