Closed omajid closed 1 month ago
cc @tmds
Tagging subscribers to this area: @tommcdon See info in area-owners.md if you want to be subscribed.
RHEL 8 on arm64/aarch64
We've had some issues in the past due to the 64kB page size (like https://github.com/dotnet/runtime/issues/91864), this may be another one of those.
@omajid, is there any way you could catch the stack smashing under lldb? It is going to take me a while to put together an RHEL 8 arm64 device.
$ lldb /usr/lib64/dotnet/shared/Microsoft.NETCore.App/9.0.0-rc.1.24431.7/createdump
(lldb) target create "/usr/lib64/dotnet/shared/Microsoft.NETCore.App/9.0.0-rc.1.24431.7/createdump"
Current executable set to '/usr/lib64/dotnet/shared/Microsoft.NETCore.App/9.0.0-rc.1.24431.7/createdump' (aarch64).
(lldb) r -f dump 34680
Process 35063 launched: '/usr/lib64/dotnet/shared/Microsoft.NETCore.App/9.0.0-rc.1.24431.7/createdump' (aarch64)
[createdump] Gathering state for process 34680 dotnet
Process 35063 stopped and restarted: thread 1 received signal: SIGCHLD
Process 35063 stopped and restarted: thread 1 received signal: SIGCHLD
Process 35063 stopped and restarted: thread 1 received signal: SIGCHLD
Process 35063 stopped and restarted: thread 1 received signal: SIGCHLD
Process 35063 stopped and restarted: thread 1 received signal: SIGCHLD
Process 35063 stopped and restarted: thread 1 received signal: SIGCHLD
Process 35063 stopped and restarted: thread 1 received signal: SIGCHLD
Process 35063 stopped and restarted: thread 1 received signal: SIGCHLD
Process 35063 stopped and restarted: thread 1 received signal: SIGCHLD
Process 35063 stopped and restarted: thread 1 received signal: SIGCHLD
Process 35063 stopped and restarted: thread 1 received signal: SIGCHLD
Process 35063 stopped and restarted: thread 1 received signal: SIGCHLD
Process 35063 stopped and restarted: thread 1 received signal: SIGCHLD
Process 35063 stopped and restarted: thread 1 received signal: SIGCHLD
Process 35063 stopped and restarted: thread 1 received signal: SIGCHLD
Process 35063 stopped and restarted: thread 1 received signal: SIGCHLD
Process 35063 stopped and restarted: thread 1 received signal: SIGCHLD
Process 35063 stopped and restarted: thread 1 received signal: SIGCHLD
[createdump] Writing minidump with heap to file dump
Process 35063 stopped
* thread #1, name = 'createdump', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x1000000000000)
frame #0: 0x0000fffff7b22d40 libc.so.6`__GI___memset_generic + 256
libc.so.6`__GI___memset_generic:
-> 0xfffff7b22d40 <+256>: dc zva, x3
0xfffff7b22d44 <+260>: add x3, x3, #0x40
0xfffff7b22d48 <+264>: subs x2, x2, #0x40
0xfffff7b22d4c <+268>: b.hi 0xfffff7b22d40 ; <+256>
(lldb) bt
* thread #1, name = 'createdump', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x1000000000000)
* frame #0: 0x0000fffff7b22d40 libc.so.6`__GI___memset_generic + 256
frame #1: 0x0000aaaaaaac5ad0 createdump`DumpWriter::WriteDiagInfo(unsigned long) [inlined] memset(__dest=0x0000ffffffffa298, __ch=0, __len=65496) at string_fortified.h:74:10 [opt]
frame #2: 0x0000aaaaaaac5ac0 createdump`DumpWriter::WriteDiagInfo(this=0x0000ffffffffa280, size=<unavailable>) at dumpwriter.cpp:50:5 [opt]
frame #3: 0x0000aaaaaaabfbf4 createdump`DumpWriter::WriteDump(this=0x0000ffffffffa280) at dumpwriterelf.cpp:181:18 [opt]
frame #4: 0x0000aaaaaaabc5f0 createdump`CreateDump(options=0x0000ffffffffe328) at createdumpunix.cpp:89:25 [opt]
(lldb) frame select 2
frame #2: 0x0000aaaaaaac5ac0 createdump`DumpWriter::WriteDiagInfo(this=0x0000ffffffffa280, size=<unavailable>) at dumpwriter.cpp:50:5 [opt]
47 }
48 size_t alignment = size - sizeof(header);
49 assert(alignment < sizeof(m_tempBuffer));
-> 50 memset(m_tempBuffer, 0, alignment);
51 if (!WriteData(m_tempBuffer, alignment)) {
52 return false;
53 }
(lldb) p alignment
(size_t) 65496
(lldb) p size
error: Couldn't materialize: couldn't get the value of variable size: Could not evaluate DW_OP_entry_value.
error: errored out in DoExecute, couldn't PrepareToExecuteJITExpression
(lldb) p sizeof(header)
(unsigned long) 40
(lldb) p sizeof(m_tempBuffer)
(unsigned long) 16384
Looks like we are trying to write 65496 bytes to a location that can only hold 16384 bytes.
$ getconf PAGE_SIZE
65536
$ python3 -c 'print(65536 - 40)'
65496
Yeah, looks like a page size issue like @tmds mentioned above.
Digging a bit, it looks like @tmds 's changes at https://github.com/dotnet/runtime/pull/91865 were reverted by https://github.com/dotnet/runtime/pull/95433. So the original issue in https://github.com/dotnet/runtime/issues/91864 has re-appeared.
Thanks for figuring this out. Looks like I did the original fix trying to fix an assert on MacOS arm64. I'm looking into how to fix both.
@omajid, is there anyway you could validate this fix (PR #108166) your issue?
Yes, I should be able to take a VMR checkout, apply this change and see if the resulting SDK has any issues or not. Looking at it now.
Moving issue to 9.0.0 for backport
I can confirm this PR makes things work for me again:
$ uname -m
aarch64
$ cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="8.10 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.10"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.10 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8"
BUG_REPORT_URL="https://issues.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.10
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.10"
$ ~/dotnet-sdk/dotnet --info
.NET SDK:
Version: 9.0.100-rc.2.24474.1
Commit: 1f747cd885
Workload version: 9.0.100-manifests.934ebbcd
MSBuild version: 17.12.0-preview-24469-05+1f747cd88
Runtime Environment:
OS Name: rhel
OS Version: 8
OS Platform: Linux
RID: rhel.8.10-arm64
Base Path: /home/omajid/dotnet-sdk/sdk/9.0.100-rc.2.24474.1/
.NET workloads installed:
There are no installed workloads to display.
Configured to use loose manifests when installing new manifests.
Host:
Version: 9.0.0-rtm.24473.2
Architecture: arm64
Commit: static
.NET SDKs installed:
9.0.100-rc.2.24474.1 [/home/omajid/dotnet-sdk/sdk]
.NET runtimes installed:
Microsoft.AspNetCore.App 9.0.0-rtm.24473.16 [/home/omajid/dotnet-sdk/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 9.0.0-rtm.24473.2 [/home/omajid/dotnet-sdk/shared/Microsoft.NETCore.App]
Other architectures found:
None
Environment variables:
Not set
global.json file:
Not found
Learn more:
https://aka.ms/dotnet/info
Download .NET:
https://aka.ms/dotnet/download
$ ~/dotnet-sdk/shared/Microsoft.NETCore.App/9.0.0-rtm.24473.2/createdump 357990
[createdump] Gathering state for process 357990 dotnet
[createdump] Writing minidump with heap to file /tmp/coredump.357990
[createdump] Written 339873792 bytes (5186 pages) to core file
[createdump] Target process is alive
[createdump] Dump successfully written in 360ms
I tried again without the fix in https://github.com/dotnet/runtime/pull/108166 and it continues to crash, confirming that https://github.com/dotnet/runtime/pull/108166 is the fix.
Now that https://github.com/dotnet/runtime/pull/108166 and https://github.com/dotnet/runtime/pull/108208 have been merged, I am going to close this issue.
Thanks!
Description
I am trying to run createdump against an ASP.NET Core application running on RHEL 8 on arm64/aarch64 . This works flawlessly with .NET 8, but fails with .NET 9.
Reproduction Steps
Expected behavior
I get a dump
Actual behavior
Fails
Regression?
This was working in .NET 8. It was broken on both .NET 9 Preview 7 and .NET 9 RC 1.
Known Workarounds
No response
Configuration
This is the .NET 9 RC 1 SDK published by Microsoft:
This also reproduces with a self-built .NET 9 using the VMR/source-build.
This only happens on arm64/aarch64. It doesn't happen on x64.
Other information
No response