Closed kirsan31 closed 4 years ago
cc @AaronRobinsonMSFT @jkoritzinsky
@kirsan31 This behavior isn't something we are likely to change since we are officially done with the .NET Framework porting project: dotnet/announcements#130. This is non-consistent behavior with .NET Framework and we can consider this in the future, but ensuring this kind of behavior in a fully xplat manner isn't something we are likely to give high priority.
Is there a specific scenario where this behavior is desired? Perhaps we can provide an alternate approach in addressing the issue.
/cc @jkotas
@AaronRobinsonMSFT
Is there a specific scenario where this behavior is desired? Perhaps we can provide an alternate approach in addressing the issue.
Brief description of our problem: do some action on any crash, before process exit. Or other ppl want to prevent some known crashes due to AV, for example from native call.
Quote from original post:
We are migrating our WinForm apps to core 3.0+. One of the main requirement to our apps is that we must to do some action on exit (regardless of exit type - normal, crash etc). On .net we use
legacyCorruptedStateExceptionsPolicy
andAppDomain.CurrentDomain.UnhandledException
for this. During migration we found that, this not working any more. For winforms bugs, for example.
And the answer was:
we have not introduced any API for reporting fatal crashes yet. So you are still left with the
COMPlus_legacyCorruptedStateExceptionsPolicy=1
as a temporary workaround.
And now we see, that even this approach is not working at all cases :(
Note that there is an internal contract violation in coreclr.dll, which is worth fixing anyway. Depending on how you fix it (e.g., by removing the IsIPInModule(g_pMSCorEE, ip)
condition in https://github.com/dotnet/coreclr/blob/ed5dc831b09a0bfed76ddad684008bebc86ab2f0/src/vm/excep.cpp#L7508-L7510 — speculating here), the original issue may be fixed automatically. It affects not only Marshal.StructureToPtr
, but everything that ends up calling memcpyNoGCRefs
or other VC Runtime helpers, which are now statically linked.
@AntonLapounov It would require more than IsIPInModule(g_pMSCorEE, ip)
because in non-local GC the GCHeapUtilities::GetGCModule()
call also returns coreclr.dll
. I am not sure how we should address this in general. As mentioned this really isn't interop specific, although it would seem to impact interop a great deal, but rather a contradiction/violation in the exception handling logic.
I don't have a strong opinion here about how the exception logic should or should not be updated. I will say that from an interop perspective catching any A/V should be discourage since the entire state of the runtime is in question.
I would like to hear from @janvorli or @jkotas about what they want here. I have a fix locally so it isn't too much of an issue other than the tactical decision to permit these kind of scenarios.
/cc @sergiy-k
The EH works with the assumption that hardware exceptions should be allowed to translate to managed exceptions (like NullReferenceException, etc) only in managed code or in a small number of specific places in the native parts of the runtime - in asm helpers that run on behalf of the managed code (like GC write barriers). For these helpers, we pretend the exception happened in the managed caller at the call site. On Windows only, these exceptions are also allowed to leak from external dlls.
There is also a holder AVInRuntimeImplOkayHolder which allows AVs in parts of code guarded by that holder to be handled in the native code and not shoot down the runtime. Besides the debugger code, there are only 26 occurences of that holder in the runtime at very specific places. And this holder has an effect only on Windows.
Hardware exceptions happening anywhere else in the runtime / gc lead to FailFast, since they mean a corrupted state where attempting to handle them could possibly lead to even worse consequences. We may be able to handle the specific case of problematic static linking of MoveSmall into coreclr by adding the AVInRuntimeImplOkayHolder into the memcpyNoGCRefs, but I am not sure about the performance consequences.
Hardware exceptions happening in external native dlls are supposed to be handled inside of those dlls, since the runtime has no idea whether such exceptions are allowed to happen there or not. But in case a hardware exception occurs in an external dll, it is not handled in there and leaks into our runtime, we try to handle it, but only on Windows.
One of the main requirement to our apps is that we must to do some action on exit (regardless of exit type - normal, crash etc).
The only reliable way to do this is to start a second watchdog process when you app starts, and does the action when it sees that the app exited.
And now we see, that even this approach is not working at all cases :(
It did not work in all cases in .NET Framework either.
@jkotas
The only reliable way to do this is to start a second watchdog process when you app starts, and does the action when it sees that the app exited.
Yea, we definitely will do this way.
It did not work in all cases in .NET Framework either.
Of course, but on practice for about 15 yeas we didn't see one. But I agree that this is not a substantive conversation already.
One question that remain, is the way to diagnose - get crash details? You mentioned it here. Can we hope for such a feature?
If you are on Windows and have the watchdog process, the easiest thing is to get the crash details from Event Log.
@jkotas
If you are on Windows and have the watchdog process, the easiest thing is to get the crash details from Event Log.
Yea... .net core 3.0 from Event Log:
Application: AccessViolationExceptionTest.exe
CoreCLR Version: 4.700.19.46205
.NET Core Version: 3.0.0
Description: The process was terminated due to an internal error in the .NET Runtime at IP 000007FEB3886BDA (000007FEB3740000) with exit code 80131506.
.net 4.7.2 from AppDomain.CurrentDomain.UnhandledException:
Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
at System.Runtime.InteropServices.Marshal.StructureToPtr(Object structure, IntPtr ptr, Boolean fDeleteOld)
at AccessViolationExceptionTest.Form1.button1_Click(Object sender, EventArgs e) in D:\save\projects\core tests\AccessViolationExceptionWinForms\Form1.cs:line 35
I think difference is obvious...
The runtime tries to log the stacktrace when it seems safe to execute the additional code required to log the stacktrace. It is a best effort and the behavior will differ between runtimes and runtime versions.
There are other cases where both .NET Framework and .NET Core log the error without stacktrace, for example:
using System;
unsafe class Test
{
int _data;
static void Main()
{
Test o = new Test();
fixed (int * p = &o._data)
{
*(p - 1) = 0x12345678;
}
GC.Collect();
GC.KeepAlive(o);
}
}
Also, there are cases where .NET Core logs the error with stacktrace, but .NET Framework logs the error without the stacktrace.
If you would like to extract as many details as possible from the crash, you may want to look at analyzing the crash dumps using https://github.com/microsoft/clrmd.
When we set
COMPlus_legacyCorruptedStateExceptionsPolicy
environment variable to 1, we expect to catch AccessViolationException in .net core program. It's not working in case of StructureToPtr (some other cases?) - AccessViolationException not capturing.Test program:
```cs class Program { static void DoSomeAccessViolation() { Marshal.StructureToPtr(42, (IntPtr)42, true); } static void Main(string[] args) { try { DoSomeAccessViolation(); } catch (Exception ex) { Console.Error.WriteLine(ex); } } } ```Detailed explanation of the bug from @AntonLapounov. Original question on stackoverflow. Related? question on stackoverflow.
/cc @janvorli