dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.26k stars 4.73k forks source link

Crash with "munmap_chunk(): invalid pointer" error #106738

Open John-R-D opened 2 months ago

John-R-D commented 2 months ago

Is there an existing issue for this?

Describe the bug

During the course of normal operations, we suddenly got an error "munmap_chunk(): invalid pointer", which results in the application crashing with the log message "Crashing thread 0057 signal 6 (0006)". Analyzing the dump of the crashing thread shows the following stack:

[InlinedCallFrame: 00007f6de6f9c968] Interop+Crypto.X509Destroy(IntPtr)
[InlinedCallFrame: 00007f6de6f9c968] Interop+Crypto.X509Destroy(IntPtr)
System.Runtime.InteropServices.SafeHandle.Finalize() [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/InteropServices/SafeHandle.cs @ 90]
[DebuggerU2MCatchHandlerFrame: 00007f6de6f9cd30] 

We are currently using .NET 6.0 GC mode.

Expected Behavior

Application does not crash

Steps To Reproduce

Unable to reproduce

Exceptions (if any)

No response

.NET Version

8.0.7

Anything else?

                Host:
                Version:      8.0.7
                Architecture: x64
                Commit:       2aade6beb0
                RID:          linux-x64

                .NET SDKs installed:
                No SDKs were found.

                .NET runtimes installed:
                Microsoft.AspNetCore.App 8.0.7 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
                Microsoft.NETCore.App 8.0.7 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

                Other architectures found:
                None

                Environment variables:
                Not set

                global.json file:
                Not found
gfoidl commented 2 months ago

Unfortunately with the information given, this issue isn't actionable (no repro, no code, etc.).

Because of Interop+Crypto.X509Destroy(IntPtr) (from the stack trace) I guess this issue should be moved to runtime anyway.

amcasey commented 2 months ago

@John-R-D Can you please provide more info about what sort of app you're running? Is it aspnetcore? If so, is it kestrel or IIS? As @gfoidl points out, this isn't actionable with the info provided.

Since dotnet/runtime has some closed issues about munmap_chunk, they might have a better idea of where to start (and can, of course, move the issue back here if it's specific to aspnetcore).

dotnet-policy-service[bot] commented 2 months ago

Tagging subscribers to this area: @dotnet/area-system-security, @bartonjs, @vcsjones See info in area-owners.md if you want to be subscribed.

John-R-D commented 2 months ago

We're running aspnetcore using kestrel.
As the stack from the error has so little information, I wouldn't know how to produce test code that can reproduce the issue. However, given the location of the error, it seemed prudent to post it here to at least let you know there is some kind of error.

Do I need to do anything to move this over to dotnet/runtime? Is there some way I might be able to narrow down the source of the issue to be able to produce code that can reproduce the issue?

jeffhandley commented 2 months ago

@bartonjs - Note this is occurring in .NET 8. Does anything come to mind for what could lead to such a crash?

@John-R-D - How frequently is this happening? Is it occurring repeatedly, or did this happen once with service recovery thereafter?

John-R-D commented 2 months ago

Looking through our logs, I see 5 instances of this in the last 5 months. Four of the cases have identical stack traces, and one failed to even collect a dump but the error message and symptoms match. We are seeing several different error messages associated with the issue, from: .NET Finalizer[11008]: segfault ... in libc.so.6 double free or corruption (out) munmap_chunk(): invalid pointer

We did create a previous issue here that I wasn't aware of when I opened this case due to the error message being different (https://github.com/dotnet/runtime/issues/101804) where the stack of the failing thread is identical. Though I'm not sure this new case provides any additional details that may help in identifying the issue.

In the most recent 3 cases, the service did not recover. The application entered a frozen state (twice after the dump was collected and once before we got a dump) and required a manual reset.

jeffhandley commented 1 month ago

Thanks for that extra info and connecting this back to the previous issue, @John-R-D. I'm going to triage this out of the 9.0.0 milestone, but we will leave the issue open for future investigation as it has reoccurred since we closed/locked the previous issue.