dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.64k stars 4.57k forks source link

Improve event pipe thread time analysis by detecting blocking methods #68446

Open davidfowl opened 2 years ago

davidfowl commented 2 years ago

We should mark known blocking methods (call an that mutates the thread state) so that event pipe can distinguish between time on the stack because of blocking vs not. While this won't be perfect when calling foreign code (pinvokes etc) we should be able to have tool distinguish the very common know cases.

cc @noahfalk @brianrob

dotnet-issue-labeler[bot] commented 2 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost commented 2 years ago

Tagging subscribers to this area: @tommcdon See info in area-owners.md if you want to be subscribed.

Issue Details
We should known blocking methods so that event pipe can distinguish between time on the stack because of blocking vs not. While this won't be perfect when calling foreign code (pinvokes etc) we should be able to have tool distinguish the very common know cases. cc @noahfalk @brianrob
Author: davidfowl
Assignees: -
Labels: `area-Diagnostics-coreclr`, `untriaged`
Milestone: -
noahfalk commented 2 years ago

Presumably this would only be useful when coupled with tooling that can recognize some hypothetical new blocking marker so I'd expect this to need work in TraceEvent/Perfview/VS before it has any end-user value. This will need prioritization as part of an E2E profiling scenario.

davidfowl commented 2 years ago

We should do the work in the runtime because of the schedule, we can update tools out of band

brianrob commented 2 years ago

This could be a very nice way to curate the "time spent" in a profile. Imagine having categories such as "blocked time", "locks", "external", etc. I like this, but I agree that we need to plan what the E2E looks like before we do the work in the runtime. Agreed that the work in out-of-band tools can wait - just want to make sure that we accurately plan the scenario so that what we do in the runtime ends up working. FWIW, we might be able to do some of this with existing events, such as Contention/Start and Contention/Stop. Same with GC time.

davidfowl commented 2 years ago

I'm guessing we can also accomplish a similar thing purely on the tools side by special casing blocking APIs and categorizing blocked time as such.

brianrob commented 2 years ago

There is likely a lot that can happen in the tools. It won't be perfect (e.g. inlining could impact our ability to see particular frames that represent blocked time), but we should come up with a list and then see what we'd have to do to support each one.

tommcdon commented 10 months ago

@lateralusX

lateralusX commented 10 months ago

I have been thinking to improve our sample profiler since I originally ported EventPipe to Mono, that could give us more options to do additional thread analyses in the future. The initial idea is to make the internals of EventPipe sample profiler runtime specifics, so each runtime implementation and platform can use different techniques on how to stop/sample/restart threads. In current sample profiler we use STW "Stop The World" to run this sequence on all platforms, but that adds a bunch of side effects and potential overhead (for example, CoreCLR will trigger GC events as a side effect). On Mono, we had an alternative approach in the past where we used pthread signals on platforms supporting it, meaning that the sample thread would raise async signals to threads that dumped samples, without bringing complete runtime to a halt on every sample interval. Since a thread in signal state can only call async signal safe API's (EventPipe is not async signal safe), we would need to store the data in a lock free queue that the sample thread would write to EventPipe on every sample interval. To make that accurate we would need a new V2 of our sample profiler event, since our current event uses the timestamp from EventPipe as the timestamp for the sample, but since timestamps can happen async we would need to let each thread sample its own time and that time will then be included as a value on the new V2 of sample profile event. Since we would need to extend the sample profiler event with additional fields, we could then take the opportunity to include optional additional data on the sample profiler event, like a threads current cpu time (user/kernel thread time) and if we include that data, tools could start to calculate how much CPU time a specific thread used over time and since we carry callstacks with the sample profile events it is possible to find blocking methods.

Each runtime could then implement the internal sampling mechanism differently based on runtime and platform support, but still making sure we get additional data on the new version of the sample profile event regardless if runtime uses STW, signals or some other method to capture that data on every sample interval. If the platform is not capable to retrieve things like user, kernel thread time, then these fields will be 0, so tooling need to view them as optional.