dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.56k stars 4.54k forks source link

[API Proposal]: support thread dump on `kill -3` #104425

Open rmannibucau opened 3 days ago

rmannibucau commented 3 days ago

Background and motivation

Getting a live thread dump is a common and very interesting way to start investigating perf issues but using dotnet-dump is too heavy and not "live" enough (you need to get a dump then open the dump).

API Proposal

wiring the -3 signal on the dotnet process and ensure the process catch this signal and prints the thread dump (all actual OS threads+their stacks) in the Console.

API Usage

kill -3 $pid

Alternative Designs

There are a lot of other possibilities but they the key is to be:

Risks

To day kill -3 kills the process so maybe another signal can be needed.

dotnet-policy-service[bot] commented 3 days ago

Tagging subscribers to this area: @tommcdon See info in area-owners.md if you want to be subscribed.

huoyaoyuan commented 3 days ago

As far as I can search, kill -3(SIGQUIT) means terminate with core dump. It isn't meant to be a live option that doesn't interrupt the process. Printing stack traces of threads seems to be a JVM-specific behavior.

Getting a live thread dump is a common and very interesting way to start investigating perf issues

I can't understand how can it be applied to performance issues. Performance issues are more investigated with profiling, which continuously collects the stack traces and running methods. I believe there is corresponding functionality in debugger interface.

A new dotnet tool printing thread dumps may be introduced, though.

jkotas commented 3 days ago

https://learn.microsoft.com/dotnet/core/diagnostics/dotnet-stack tool prints managed stacktraces of all threads.

rmannibucau commented 3 days ago

Hope it makes sense

huoyaoyuan commented 3 days ago

Note: I'm often seeing thread dumps in test failure logs. It seems that we do print thread dumps in certain situations.

rmannibucau commented 3 days ago

@huoyaoyuan the ones on timeouts? (https://github.com/microsoft/testfx/blob/main/src/Platform/Microsoft.Testing.Extensions.HangDump/HangDumpProcessLifetimeHandler.cs#L320)

@jkotas yes this is exactly what I'd like to be able to call on an application already compiled/bundled "out of the box". Indeed there are always workarounds (like uploading dotnet-stack - even if not as trivial as that - in a running container) but this feature is very few lines of code IMHO and is worth it (maybe I'm biased to have abused it in java, I can ack that).

huoyaoyuan commented 3 days ago

the ones on timeouts?

No, in CI failures of native crashes.

rmannibucau commented 3 days ago

I see, createdump call on SIGSEGV.

One challenge there is the only way to get threads+their stack (together) is to use diagnostic package (this is not built-in) so it can be neat to enable it in core + attach a dump callback to an unused process signal if possible.