dotnet / diagnostics

This repository contains the source code for various .NET Core runtime diagnostic tools and documents.
MIT License
1.18k stars 356 forks source link

dotnet-dump analyzer don't work at Tizen x86 emulator #846

Open viewizard opened 4 years ago

viewizard commented 4 years ago

I am testing dotnet-dump analyzer on Tizen emulator (x86, qemu based) in order to investigate x86 coredumps. Accordinately to diagnostics and clrmd docs, x86 are supported (both x86 and amd64 https://github.com/microsoft/clrmd/blob/master/doc/FAQ.md#does-this-work-with-any-architecture-x86x64 architectures (x64, x86, arm, arm64) https://github.com/dotnet/diagnostics#net-core-diagnostics-repo), but when I try to analyze core dump I see this output:

Loading core dump: /opt/usr/share/crash/dump/dotnet-launcher_7493_20200210194203/dotnet-launcher_7493_20200210194203.coredump ...
Ready to process analysis commands. Type 'help' to list available commands or 'help [command]' to get detailed help on a command.
Type 'quit' or 'exit' to exit the session.
Invalid architecture EM_386

In the same time, I don't have any issue with same coredump with lldb+SOS plugin:

bash-3.2# lldb /usr/bin/dotnet-launcher -c /opt/usr/share/crash/dump/dotnet-launcher_7493_20200210194203/dotnet-launcher_7493_20200210194203/dotnet-launcher_7493_20200210194203.coredump
(lldb) target create "/usr/bin/dotnet-launcher" --core "/opt/usr/share/crash/dump/dotnet-launcher_7493_20200210194203/dotnet-launcher_7493_20200210194203.coredump"
Core file '/opt/usr/share/crash/dump/dotnet-launcher_7493_20200210194203/dotnet-launcher_7493_20200210194203.coredump' (i386) was loaded.
(lldb) clrstack
OS Thread Id: 0x1d45 (1)
Child SP       IP Call Site
BFC0A57C b561cd4e [HelperMethodFrame: bfc0a57c]
BFC0A600 B2BE18B1 MITestExceptionBreakpoint.Test.TestAppException() [/home/viewizard/Desktop/netcoredbg/test-suite/MITestExceptionBreakpoint/Program.cs @ 232]
BFC0AEAC b5360592 [HelperMethodFrame: bfc0aeac]
BFC0AF30 B2BE19AB MITestExceptionBreakpoint.Test.ThrowInner() [/home/viewizard/Desktop/netcoredbg/test-suite/MITestExceptionBreakpoint/Program.cs @ 249]
BFC0AF50 B2BE18F0 MITestExceptionBreakpoint.Test.CatchInner() [/home/viewizard/Desktop/netcoredbg/test-suite/MITestExceptionBreakpoint/Program.cs @ 256]
BFC0AF70 B2BE17E4 MITestExceptionBreakpoint.Test.TestAppException() [/home/viewizard/Desktop/netcoredbg/test-suite/MITestExceptionBreakpoint/Program.cs @ 235]
BFC0AFB0 B35D9AEF MITestExceptionBreakpoint.Program.Main(System.String[]) [/home/viewizard/Desktop/netcoredbg/test-suite/MITestExceptionBreakpoint/Program.cs @ 501]
BFC0B114 b53606c6 [GCFrame: bfc0b114]
BFC0B4F4 b53606c6 [GCFrame: bfc0b4f4]

During fast investigation, I found: https://github.com/microsoft/clrmd/blob/b477434904d85ead51686087ea7472dd15624ce3/src/Microsoft.Diagnostics.Runtime/src/Linux/ElfCoreFile.cs#L32-L35 Looks like, x86 support don't implemented in clrmd.

Accordinately to memory consumtion tests on arm32, dotnet-dump need in 2 times less memory during coredump analyze compared to lldb+SOS plugin, that make it good alternative in case of crash investigation on Tizen devices/emulator.

@leculver @mikem8361 @NextTurn could you please share information, what the real x86 support status. Do you have any plans for full x86 dotnet-dump support?

CC @swift-kim @gbalykov @alpencolt

swift-kim commented 4 years ago

I found a comment from the author. https://github.com/microsoft/clrmd/pull/280#discussion_r305953249

leculver commented 4 years ago

I can't comment on what .Net Core supports in terms of diagnostics. (I leave that to Mike and the rest of the diagnostics team.)

As for ClrMD, we are happy to support all SKUs and permutations of CLR that customers are using (even ones not officially supported)...but with the caveat that I'm only one person, and ClrMD isn't my only responsibility. I don't have the time to build everything.

In short, I would be happy to accept any pull request that implements x86 Linux support in ClrMD. I'd be open to someone filing an issue asking for x86 Linux support (it wouldn't be closed). If I had infinite time I'd build it myself, but I can't commit to working on this in the short-term. (All of this also applies to OS X support for ClrMD.)

Thanks!

hoyosjs commented 4 years ago

As for dotnet-dump, there's never been any testing performed in Linux x86 as it's not a supported architecture in dotnet. We currently don't build and package it, and our build system hasn't been properly extended to generate binaries. Are you running one you build yourself? Or is this an x64 system and you installed the tools using dotnet tool install? If it's the later, you are using a dotnet-dump that's using sos x64, so this wouldn't work. Also, what version of the runtime are you using?

viewizard commented 4 years ago

@leculver thank you for info!

@hoyosjs I test it with 3.0 release and upstream coreclr and diagnostics/clrmd. I cross build all this binaries manually with build script from repos for x86 on x64 host, after that, I copy artifacts on x86 for testing. I also test sos plugin independently via lldb in order to be sure, that sos are working.

mikem8361 commented 4 years ago

I'm also not sure when we can get to fixing dotnet-dump/SOS to work with x86 linux core dumps but it will depend on clrmd support.

nxtn commented 4 years ago

@viewizard Which exact commit did you cross build CoreCLR from? The latest master build behaved well in saying hello world, but when running ClrMD it produced endless assertion failures and segmentation faults.

How did you collect the core dumps? The createdump utility didn't support x86 platforms. And core dumps generated by x64 Linux didn't contain sufficient information for managed diagnostics - even clrstack would partially fail.

viewizard commented 4 years ago

@NextTurn ATM we use CoreCLR 3.0.0 release for Tizen, commit 922429db0144dd6f3b4324805464dae82857512a. Coredump was generated by x86 Linux.

nxtn commented 4 years ago

I've successfully built CoreCLR and native libraries, but I couldn't build even one managed library. The last try was to use ARM32 libraries instead, and then Debug.Fail itself crashed.

@viewizard If possible could you please send me a copy of .NET Core x86 runtime for testing since you've got a working one? Thanks.

viewizard commented 4 years ago

@NextTurn I am using Tizen x86 packages, that could be also download from here:

Native packages (coreclr-3.0.0-16.1.i686.rpm, ...): http://download.tizen.org/releases/milestone/tizen/unified/tizen-unified_20191031.1/repos/standard/packages/i686/

Managed packages (corefx-managed-3.0.0-10.2.noarch.rpm, ...) http://download.tizen.org/releases/milestone/tizen/unified/tizen-unified_20191031.1/repos/standard/packages/noarch/

Some system packages: http://download.tizen.org/releases/milestone/tizen/base/tizen-base_20191011.2/repos/standard/packages/i686/

nxtn commented 4 years ago

Oh I didn't ever know that Tizen have packages of their own. Thanks a lot :)

nxtn commented 4 years ago

I couldn't start Tizen Emulator on any of my machines. And those packages apparently don't work on non-Tizen systems due to dotnet/runtime#33269.

Nevertheless I can offer some small patches if you have time for testing. You need to specify linux-x86 RID to select the correct calling convention when building. If you find them working then we can consider actually merging these changes.

https://github.com/microsoft/clrmd/compare/release/1.1...NextTurn:v1-x86 https://github.com/microsoft/clrmd/compare/release/1.1...NextTurn:v1-callconv

viewizard commented 4 years ago

@NextTurn I applied patches you provided, and got this log:

Loading core dump: /opt/usr/share/crash/dump/dotnet-launcher_7493_20200210194203/dotnet-launcher_7493_20200210194203.coredump ...
Ready to process analysis commands. Type 'help' to list available commands or 'help [command]' to get detailed help on a command.
Type 'quit' or 'exit' to exit the session.
> clrstack
dotnet-dump Information: 0 :
SOS initialized: tempDirectory '' dacFilePath '' sosPath '/proc/self/fd/32/lib/linux-x86/libsos.so'
SOS does not support the current target architecture 0xaaef43e4

after that, stack smashing was detected and process was aborted. Did I miss something?

nxtn commented 4 years ago

So there is another problem in SOS rather than ClrMD...

Besides, it just occurred to me today that the sign extension issues fixed in ClrMD v2 aren't backported to v1 so v1 definitely won't work even with these patches. @swift-kim may know more about this. Sorry for the inconvenience.

FYI the corresponding patches for v2 are ready.

https://github.com/microsoft/clrmd/compare/master...NextTurn:x86 https://github.com/microsoft/clrmd/compare/master...NextTurn:callconv

viewizard commented 4 years ago

As I understood, current ClrMD master need CoreCLR 3.1 runtime with proper deps, for now I see only this output if I try to execute dotnet-dump:

 Unhandled exception:
 System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
  ---> System.IO.FileLoadException: Could not load file or assembly 'System.Collections.Immutable, Version=1.2.5.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'. The located assembly's manifest definition does not match the assembly reference. (0x80131040)
 File name: 'System.Collections.Immutable, Version=1.2.5.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'
    at Microsoft.Diagnostics.Tools.Dump.Analyzer.Analyze(FileInfo dump_path, String[] command)
    at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
    at Microsoft.Diagnostics.Tools.Dump.Analyzer.Analyze(FileInfo dump_path, String[] command)
...

since we have CoreCLR 3.0 with old System.Collections.Immutable version on Tizen.

nxtn commented 4 years ago

current ClrMD master need CoreCLR 3.1 runtime with proper deps

It targets .NET Core 2.1 and .NET Core 3.1 since they are LTS versions. It's OK to change the TFM to .NET Core 3.0.

Downgrading the System.Collections.Immutable package version to 1.6.0 will solve the deps problem. The other dependencies are present in CoreFX.

nxtn commented 4 years ago

After a clean Windows reinstall, I finally set up the Tizen Emulator and had ClrMD v2 run on it.

Due to the fact that a bunch of functionalities are broken in either CLR or DAC, it's not possible to make ClrMD generally available on x86. But it may be possible to work around a number of x86-specific issues to implement a subset of features.

Take your use case of clrstack as an example. ClrMD currently only has one stack trace implementation using the DAC API ISOSDacInterface::GetThreadStoreData which is broken on x86 (which is also used by clrstack -all so that command won't work either). This may be worked around if ClrMD adds a feature like parameterless clrstack using the same functions that clrstack uses.

Other common codes also need to be handled specially to avoid segmentation faults or exceptions:

-using TypeBuilder typeData = _typeBuilders.Rent();
+using TypeBuilder typeData = new TypeBuilder();
-if (typeData.ParentMethodTable != 0 && !_types.TryGetValue(typeData.ParentMethodTable, out baseType))
+if (typeData.ParentMethodTable != 0 && typeData.ParentMethodTable != 0xffffffff && !_types.TryGetValue(typeData.ParentMethodTable, out baseType))

You're lucky to have LLDB + SOS to use on Tizen. LLDB completely doesn't work on Ubuntu x86. It was a known issue years ago but wasn't fixed because of magic reasons.

leculver commented 4 years ago

the DAC API ISOSDacInterface::GetThreadStoreData which is broken on x86

ClrMD requires a working dac (at least to a reasonable degree) in order to function. GetThreadStoreData is the single dac function that we require to work for every dump and process. This is how we test that super tiny crash dumps, like triage dumps, work correctly. Getting basic dac functionality to work would be the #1 priority here, before I worked on getting ClrMD functional on that platform.

It is lucky to have lldb + sos working properly. =)

mikem8361 commented 4 years ago

Are you saying that lldb/SOS works propertly (i.e. clrthreads displays the managed threads, etc) on Tizen x86 but clrmd doesn't? SOS obviously needs the basic DAC functions to work too so I don't understand how clrmd doesn't work.

nxtn commented 4 years ago
sh-3.2# rpm -iv lldb-5.0.2-18.9.i686.rpm
Preparing packages...
error: Can't write smack rules
error: Setting up smack rules for lldb failed
error: lldb-5.0.2-18.9.i686: install failed
error: Unable to write device security policy to /etc/device-sec-policy

(I'd better know how to bypass the security policy in Tizen.)

Are you saying that lldb/SOS works propertly (i.e. clrthreads displays the managed threads, etc)

I'm pretty sure clrthreads won't work either, because DacpThreadStoreData.firstThread is zero.

swift-kim commented 4 years ago

(I'd better know how to bypass the security policy in Tizen.)

@NextTurn The command mount -o remount,rw / will help in that case.

swift-kim commented 4 years ago

I was able to run dotnet-dump on Tizen x86 Emulator successfully by applying @NextTurn's patch in https://github.com/dotnet/diagnostics/issues/846#issuecomment-596173451 to ClrMD 1.1 and a similar workaround to SOS.Hosting. This is my working branch for ClrMD.

bash-3.2# /usr/share/dotnet.tizen/netcoreapp/corerun dotnet-dump.dll analyze coredump.4209
Loading core dump: /tmp/coredump.4001 ...
Ready to process analysis commands. Type 'help' to list available commands or 'help [command]' to get detailed help on a command.
Type 'quit' or 'exit' to exit the session.
> clrstack -all
OS Thread Id: 0xfa1
Child SP       IP Call Site
BFCFF9D8 b7741a49 [InlinedCallFrame: bfcff9d8]
BFCFF9D8 a9452f6f [InlinedCallFrame: bfcff9d8] Interop+Application.Main(Int32, System.String[], UIAppLifecycleCallbacks ByRef, IntPtr)
BFCFF9C0 A9452F6F ILStubClass.IL_STUB_PInvoke(Int32, System.String[], UIAppLifecycleCallbacks ByRef, IntPtr)
BFCFFA70 A9452C17 Tizen.Applications.CoreBackend.UICoreBackend.Run(System.String[])
BFCFFB20 A9452622 Tizen.Applications.CoreApplication.Run(System.String[])
BFCFFBC0 A94518CA Tizen.Applications.CoreUIApplication.Run(System.String[])
BFCFFC00 A96E4682 /proc/self/fd/49/bin/Alarm.dll!Unknown
BFCFFD94 b184d6c6 [GCFrame: bfcffd94]
BFD00174 b184d6c6 [GCFrame: bfd00174]
OS Thread Id: 0xfa8
Child SP       IP Call Site
AECACF34 b7741a49 [DebuggerU2MCatchHandlerFrame: aecacf34]
OS Thread Id: 0xfaa
Child SP       IP Call Site
>

The new ClrMD binary didn't work with a simple test program (PrintStackTrace) however.

bash-3.2# /usr/share/dotnet.tizen/netcoreapp/corerun PrintStackTrace.dll coredump.23101
Segmentation fault (core dumped)

As @NextTurn has previously mentioned in https://github.com/dotnet/diagnostics/issues/846#issuecomment-598063467, the value returned by GetThreadStoreData().FirstThread was broken as below and later the program crashed at SOSDac.GetThreadData() -> ClrDataAccess::GetThreadData(). (Why doesn't dotnet-dump fail then?)

@NextTurn Do you anyway plan to publish your changes (callconv)?

nxtn commented 4 years ago

@swift-kim Good to hear that! 🎉

@leculver Would you like to merge those changes to v1 or v2? Although they can't make ClrMD completely work on x86 they could be helpful for those who are willing to play around with it.

leculver commented 4 years ago

Definitely prefer v2 patch over v1 (or both if there's time).

mikem8361 commented 4 years ago

For dotnet-dump it would have to be clrmd v1. It will be a while before dotnet-dump can use 2.0.

swift-kim commented 4 years ago

@NextTurn Can we use just CallingConvention.Winapi which relies on the default platform convention?

nxtn commented 4 years ago

@swift-kim I didn't know that CallingConvention.Winapi has been updated to use CDECL on Linux x86. (The doc was just updated 20 days ago dotnet/dotnet-api-docs#3967.) It's definitely better.

swift-kim commented 4 years ago

One of the issues that ClrMD/coreclr has on Linux x86 is that every unmanaged to managed marshalling of ulong appears to be broken. For example, this code in SosDac.cs (ClrMD/release/1.1)

public bool GetMethodDescData(ulong md, ulong ip, out MethodDescData data)
{
    InitDelegate(ref _getMethodDescData, VTable->GetMethodDescData);
    Console.WriteLine($"{md:X}");
    int hr = _getMethodDescData(Self, md, ip, out data, 0, null, out int needed);
    Console.WriteLine($"{data.MethodDesc:X}");
    return SUCCEEDED(hr);
}

wil print

FFFFFFFFAAD7C574
AAD7C574FFFFFFFF

although data.MethodDesc is just set to md in ClrDataAccess::GetMethodDescData().

Note: The Tizen Emulator is running .NET Core 3.1.

nxtn commented 4 years ago

@swift-kim FYI specifying CallingConvention.Winapi is equivalent to omitting the UnmanagedFunctionPointerAttribute. See my PRs last year: dotnet/dotnet-api-docs#3706, dotnet/dotnet-api-docs#3707.