Closed kant2002 closed 10 months ago
The way to debug crash like this are dumps: Enable full dump collection for ilc.exe in registry https://docs.microsoft.com/en-us/windows/win32/wer/collecting-user-mode-dumps , then look into the crash dump using windbg
Sigh. And thanks!
I would run ILC in a loop overnight waiting for a crash.
Given that I hit that again (it start working today) I think that's a good idea.
The way to debug crash like this are dumps:
Does not work. Seems to be it is intercepted in C# and does not trigger crash dump collection. Strangely on application it start working third time. Seems to be ancient gods still works. Any ideas what I can do?
Seems to be it is intercepted in C# and does not trigger crash dump collection.
AccessViolationException should trigger the dump as expected. I think it is only a problem for SEHException on Windows. You can wrap the call FinishObjWriter
with exception filter that calls FailFast that triggers the dump capture, like this:
try
{
FinishObjWriter(_nativeObjectWriter);
}
catch (Exception e) when (UnexpectedExceptionFilter(e))
{
}
...
static bool UnexpectedExceptionFilter(Exception e)
{
Environment.FailFast("Fatal error", e);
return false;
}
If it works, you can submit a PR that adds this code permanently. It is not the first time we are trying to chase down an intermittent crash in FinishObjWriter
.
I seems to be able catch this in WinDBG
Output in console
LLVM ERROR: out of memory
Allocation failed
Last lines in WinDBG which related
(5d04.3d60): Security check failure or stack buffer overrun - code c0000409 (!!! second chance !!!)
Subcode: 0x7 FAST_FAIL_FATAL_APP_EXIT
ChangeEngineState
*** WARNING: Unable to verify checksum for ...\nuget\packages\runtime.win-x64.microsoft.dotnet.ilcompiler\7.0.0-alpha.1.21572.3\tools\objwriter.DLL
objwriter!SwitchSection+0x13b04d:
00007ffa`7b67256d cd29 int 29h
Seems to be I did not use debug builds, or something other stupid mistakes which I make.
Is there location where I can grab PDB files, so I can take a look at the stack trace more closely.
Sigh, we do not publish the PDB packages due to this workaround: https://github.com/dotnet/runtimelab/blob/feature/NativeAOT/src/installer/pkg/projects/Microsoft.DotNet.ILCompiler/Microsoft.DotNet.ILCompiler.pkgproj#L12
We will need to fix this workaround at some point. In the meantime, you can use locally built objwriter to investigate this crash.
I close Visual Studio and Edge and run under WinDBG with Debug ObjWriter.dll
ILC stuck within ObjWriter::Finish
Technical parameters which indicate size of data and performance characteristics of the code.
MCContext::Sections = { size= 186_791 }
MCContext::Symbols = { size=1_652_570 }
Inside MCContext::reset()
COFFAllocator.DestroyAll();
waiting for more then 10 minutes to cleanup. I suspect this is infinite loop in (because it's sitting here)
iterator iplist_impl::erase(iterator first, iterator last) {
while (first != last)
first = erase(first);
return last;
}
Also StringTableBuilder::multikeySort
slow on 2M strings, like couple minuts to sort these strings.
Data from managed side: ILCompiler.CompilerTypeSystemContext
_arrayTypes.Count == 4_926
_validType.Count = 113_380
Questions:
You have too many types generated, you may experience slow compilation times
That's not infinite loop. After maybe 30 minutes, or even an hour, COFFAllocator.DestroyAll()
do it's work. Second place where I wait long amount of time is CVContext.reset();
(more then hour as of now), all inside MCContext::reset()
.
Let me check with release. Maybe SEHException which I was seeing was OOM all the time.
I still did not able to test with Release. But I manage finish application under Debug mode. Same MCContext::reset()
two more problematic friends.
COFFUniquingMap.clear(); // 2 hours in debug.
NextID.clear(); // at least 3 hours, then after that finish overnight.
In short cleanup take at least 7 hours.
Below list of slow to deallocate types declarations. I record that in case this should go to LLVM and perf tested there.
std::map<COFFSectionKey, MCSectionCOFF *> COFFUniquingMap
StringMap
SpecificBumpPtrAllocator
_std::uniqueptr
In addition to the fact that obj file produced has size 530Mb, when run linker it produce following error
VideoMonitoring.obj : error LNK2001: unresolved external symbol RhpGetThreadStaticBaseForType
Is this some mismatch between 7.0.0-alpha.1.21572.3
version and version with Custom marshalers (which was created a bit earlier)?
RhpGetThreadStaticBaseForType
Yes, this was added recently. If you are seeing this link error, it means you have mismatched bits.
Okay. Seems to be in my case issue happens when I hit OOM. I reproduce issue with Release version of ObjWriter.dll If I run under WinDBG whole process is very slow. Couple hours at least. If I run without debugger, it's 4 minutes.
Still because it take 9Gb to run, I do not hit OOM consistently that's why I did not reflect that. Not sure if this is actionable. ot sure, if there other issue hidden somewhere.
Is there something which can be done about it?
I'm not sure that this is related, but seems pretty close.
I was trying to compile https://github.com/space-wizards/space-station-14/tree/master/Content.Server and ILC eat up to 25Gb after that crash.
MSBUILD : error MSB4166: Child node "4" exited prematurely. Shutting down. Diagnostic information may be found in files in "C:\Users\kant\AppData\Local\Temp\" and will be named MSBuild_*.failure.txt. This location can be changed by setting the MSBUILDDEBUGPATH environment variable to a different directory.
ILC produce a lot of AOT warnings, and use probably all what's forbidden due to dynamic generation, but I still see that jumping memory usage up to 25Gb is somewhat unexpected for end-user.
from what I see previously that's from types name strings. Maybe introducing something like name mangling, or some other way to address that?
Perhaps we can close this now with the new ObjWriter landing for .NET 9?
I think yes. I probably would like to take a look how new ObjWriter behave under 16Gb of RAM because it may be quite possible we change SEH to OOM
The memory usage should be better, but obviously any testing is welcome, and there is still room for improvement.
Let's close this and open new issues if necessary.
I'm occasionally hit exceptions in ObjWriter. By occasion I mean maybe after update to nightlies, or after I do some modifications in application.
or
My problem when I hit this error, I have no idea what's going on, and how to intercept SEH to at least understand what method is offending codegen. What is worrying me is that this two errors appear on same inputs. And prior that, I have an issue and on next day it solve itself.
Questions: