Open walterlv opened 4 months ago
Tagging subscribers to this area: @tommcdon See info in area-owners.md if you want to be subscribed.
Hi @walterlv! Thanks for reporting this bug!
I didn't find any environment that doesn't have this issue.
Do you know if this issue reproduces on Windows?
Do you know if this issue reproduces on Windows?
Ahh nevermind this question as the repro is very specific to linux.
Do you know if the callback/debugging issue is specific to the libpulse API (e.g. does a standalone repo that uses callback from C++ to C# on Linux reproduce the issue)? I am curious if there is something specific to libpulse that is causing the problem, for example a difference in calling convention, etc...
@tommcdon I can repro this issues by @walterlv 's repo in my linux system. And I can sure it's not the libpulse bug, because I can repro this issues with https://github.com/Haltroy/CefGlue
I can not reproduce on Windows because I fail to run the libpulse on Windows... I mean I do not know if it can be reproduced on Windows.
Possible duplicate to https://github.com/dotnet/runtime/issues/102767. @hoyosjs
Thanks to my friend @kkwpsv, he helped me to find out more information about this issue.
@tommcdon This issue is quite different from #102767:
Let's see more details here.
Then,
thread backtrace all
and we that thread 3 .NET EventPipe
is stopped with signal SIGTRAP
signal SIGSEGV: address not mapped to object (fault address: 0xbafa13a0)
.The stack traces are shown as follows:
[UnmanagedCallersOnly]
private static unsafe void Callback(byte* sourceId, int isEnabled, byte level,
long matchAnyKeywords, long matchAllKeywords, Interop.Advapi32.EVENT_FILTER_DESCRIPTOR* filterData, void* callbackContext)
{
EventPipeEventProvider _this = (EventPipeEventProvider)GCHandle.FromIntPtr((IntPtr)callbackContext).Target!;
if (_this._eventProvider.TryGetTarget(out EventProvider? target))
{
_this.ProviderCallback(target, sourceId, isEnabled, level, matchAnyKeywords, matchAllKeywords, filterData);
}
}
@hoyosjs
Hi @walterlv and @lindexi,
We haven't been able to repro the exact issue from your repros yet, but the SIGSEGV
for the EventPipeEventProvider callback looks eerily similar to https://github.com/dotnet/runtime/issues/80666#issuecomment-2249343314, where the _gchandle used in the callback had been freed before the callback completes.
If the dotnet debugger is hitting the same EventPipeEventProvider Callback issue, then there is a partial fix already merged through https://github.com/dotnet/runtime/pull/106040 and a second PR https://github.com/dotnet/runtime/pull/106156 that is open
@mdh1418 Thank you. What VisualStudio version and dotnet version you use? And do you debug the application run on Linux?
Can I test the daily dotnet version which merged https://github.com/dotnet/runtime/pull/106040 ?
What VisualStudio version and dotnet version you use? And do you debug the application run on Linux?
We used the latest version of the C# extension in VS Code
Can I test the daily dotnet version which merged https://github.com/dotnet/runtime/pull/106040 ?
Yes - the daily builds from https://github.com/dotnet/sdk/blob/main/documentation/package-table.md contain the fix.
@tommcdon I test again with https://aka.ms/dotnet/9.0.1xx/daily/dotnet-sdk-linux-x64.tar.gz
.
There is no SIGSEV now. The process still exits with SIGTRAP.
I debugged it with lldb
. Here's the output:
Seems like the same problem I'm seeing here: https://github.com/microsoft/DockerTools/issues/444
@jwilliamsonveeam Sorry, the https://github.com/microsoft/DockerTools/issues/444 is too long, I'm afraid I'm missing out on important information.
@lindexi I updated my last comment with a small self contained example of a program that fails with a sigtrap in the native c code callback. https://github.com/microsoft/DockerTools/issues/444#issuecomment-2380066894 and a zip of the whole solution is in this thread if you have access. https://developercommunity.visualstudio.com/t/dotnet-process-silently-crashes-when-deb/10740222?
I've run @walterlv's reproducer (Walterlv.Issues.TraceBreakpointTrap) and reproduced the issue as well.
I've been debugging a similar issue where the scenario is as follows:
UnmanagedFunctionPointer
) is sent to a C function through P/Invoke (annotated with DllImport
).Using @walterlv's reproducer as a base, I've modified it with these changes and managed to avoid the crash. The output from my execution is as follows:
$ ./artifacts/bin/Walterlv.Issues.TraceBreakpointTrap/debug/TraceBreakpointTrapDemo --skip-attach
### Trace/Breakpoint Trap issue on .NET debugger ###
Context state changed: 1
If you want to debug this demo using other debuggers (e.g. GDB, LLDB), you can use the following options:
--sleep <seconds> Sleep for a while before attaching debugger.
--skip-attach Skip attaching debugger and run directly.
Please attach a dotnet debugger and use 'Set next statement'.
Context state changed: 2
Context state changed: 3
Context state changed: 4
Context state changed: 5
Issue may not be reproduced. Exit.
In the output, changes 1 to 4 are from before the debugger is attached. Once the debug is attached, change 5 is printed but there's no crash.
Additionally, in my own (non-shareable) projects, I've been able to use a C debugger (lldb or gdb) to manually call the callback (through a function pointer) directly from the debugger. This led to the C# application throwing the following error:
Fatal error. Invalid Program: attempted to call a UnmanagedCallersOnly method from managed code.
This error is seemingly thrown here, but I don't have a fine understanding of the dotnet runtime. However, it leads me to believe that the key is that there are two distinct threads.
- If the debugger is attached when the C# callback is executed for the first time, the application crashed with a SIGTRAP.
- If the debugger is attached after the C# callback has been executed once, the application works correctly.
I think this may have revealed the culprit. The thing is that .NET runtime only handles signals when the thread those occurred on are known to the runtime. That means that they were either created by the runtime or called into the runtime. If the debugger sets the breakpoint on the UnmanagedCallersOnly
marked method before it calls into the runtime and registers the thread as one that runs managed code, the SIGTRAP would not call the handler in the runtime and it would invoke the default signal handler that terminates the process.
This error is seemingly thrown here
This code is for NativeAOT, in coreclr, the error comes from here: https://github.com/dotnet/runtime/blob/008ee9f84f167cee8d07e086086e1cec724750d5/src/coreclr/vm/dllimportcallback.cpp#L187-L196
@janvorli Hello and thanks for your input!
I'll be reviewing the ReversePInvokeBadTransition
function, as I think I already added a native breakpoint there (it's a extern "C"
function) and was able to hit it once.
However, I'd like to point out that the yet-unregistered thread is receiving a SIGTRAP
regardless of whether I had a .NET breakpoint or not. Is there anything relevant that the debugger could be doing on thread registration? Could you share some links to code?
https://github.com/jwilliamsonveeam/TimerCallBackDemo I created a repo with my failing case. I also do not need any breakpoints in order for this to fail with a SIGTRAP with the debugger attached.
The debugger can set some breakpoints on its own for its internal purposes. @tommcdon would most likely know if it can be the case here.
@janvorli If the debugger is setting its own breakpoint (e.g. on managed-to-unmanaged transitions) and then reaching it before the thread is properly registered with the .NET runtime (e.g. on the first .NET interaction of a thread), then the SIGTRAP
and subsequent crash would make sense.
@tommcdon Could you please confirm if my assumption is correct?
Description
Note: Not all native callbacks cause this issue so I've written a minimal reproducible example below.
Reproduction Steps
Minimal reproducible example 1:
Reproducible example 2:
Expected behavior
The app should not crash when the dotnet debugger is attached.
Actual behavior
The app crashes with an output "Trace/Breakpoint Trap".
Regression?
I've only tested this on .NET 8.0.302
Known Workarounds
I've found several workarounds:
Note:
Debugger.IsAttached
property cannot detect the native debugger so I added alternative options--sleep <seconds>
and--skip-attach
for the minimal reproducible example above.Configuration
I didn't find any environment that doesn't have this issue.
Other information
dotnet tool install -g dotnet-sos
dotnet sos install
ulimit -c unlimited
echo "0x3F"> /proc/<pid>/coredump_filter
after the process starts and the pid is known.Trace/Breakpoint Trap (core dumped)
.lldb --core core TraceBreakpointTrapDemo