dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.46k stars 4.76k forks source link

[NativeAOT] Stackoverflow reporting on Linux #82334

Open jkotas opened 1 year ago

jkotas commented 1 year ago

Repro

Recursion(1);

void Recursion(int x)
{
    Recursion(x+1);
    Recursion(x+1);
}

Actual result

Segmentation fault

Expected result

Process is terminating due to StackOverflowException

(Reported by partner team.)

ghost commented 1 year ago

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas See info in area-owners.md if you want to be subscribed.

Issue Details
### Repro ```csharp Recursion(1); void Recursion(int x) { Recursion(x+1); Recursion(x+1); } ``` ### Actual result `Segmentation fault` ### Expected result `Process is terminating due to StackOverflowException` (Reported by partner team.)
Author: jkotas
Assignees: -
Labels: `area-NativeAOT-coreclr`
Milestone: 8.0.0
jkotas commented 1 year ago

The change was reverted by #95415

jtschuster commented 11 months ago

It looks like all the crashes occurred when the SIGSEGV was hit while the GC was trying to suspend all threads. I'm not sure why that was causing crashes with the alternate stack but isn't causing crashes when it uses the regular stack.

janvorli commented 11 months ago

My (wild) guess is that it might be due to libunwind not being able to walk over the SIGSEGV frame when the handler is running on a different stack from the code where the sigsegv occured. In coreclr, we actually don't rely on libunwind over that boundary, we explicitly skip it using a context that we store in the sigsegv handler. See https://github.com/dotnet/runtime/blob/62dcb1218c72664f681d721a309b933663fd7b3b/src/coreclr/pal/src/exception/seh-unwind.cpp#L669-L679