Profiling: How to get GC Metrics in-process

discostu105 commented 8 years ago

Let's say, you have a profiler which is loaded inside of the process via Profiling API. Let's say you want to capture the impact the GC has on your process (just the CLR in the process your profiler is loaded in). Stats that would be interesting are: Gen0,1,2,LOH collection count and duration. Which native API would I use to properly get such statistics periodically with low overhead?

Some thoughts:

In full framework, this would be very possible via CLR Memory Performance counter metrics ("% Time in GC", "gen X collections", ect). However, perfcounters have been removed for CoreCLR.
Yes, there are callbacks from the Profiling API (ICorProfilerCallback::RuntimeSuspendStarted) that indicate "suspensions". However, these callbacks don't give much details on the reason of suspension (e.g. no info about which generation is being collected). Also, I am unsure about background/concurrent GC. I guess those won't trigger runtime suspensions (in all cases).
Then there is ETW. At first, it does not give you nice statistics, but events instead. So, you can capture events such as GCStart, GCEnd and you have to calculate the timings you want yourself. So far so good, but it seems that efficiently capturing ETW events from within the process (via native API) is not an easy thing. One way seem to be "private sessions", but apparently, they only work via log-files that need to be parsed [0]. That seems much too complicated and too much overhead (file-IO) for just getting some GC-stats.
In ETW, there are also so called "real time sessions", which seem to be more what I want (no file IO, just in-memory buffers). However, there's a global limit of how many such sessions can be active, and also it requires admin-privileges to create them. Not optimal either.
Browsing through the sources, I've found COR_GC_STATS to be an interesting datastructure. It contains pretty much what I am looking. It's available via CorGCHost::GetStats. Is there any way for a profiler to get to a CorGCHost instance (if the CLR is not self-hosted)?

[0] https://social.msdn.microsoft.com/Forums/en-US/2664a9cf-172a-45c3-81cc-5a4b93b3be7d/etw-private-session-without-logfile?forum=clr

discostu105 commented 8 years ago

Btw: I'm not sure if it's better to ask such questions on StackOverflow or on Github. Please let me if I should use SO instead.

jkotas commented 8 years ago

cc @Maoni0

Maoni0 commented 8 years ago

If you look deeper in some of the categories you've mentioned, there's more info. perf counters is an exception as we made a decision to not invest more in perf counters when we did background GC in 4.0. So you get whatever support it offered before (which means not much support for background GC) when it's enabled. COR_GC_STATS works off of the perf counter info so is also not available if perf counter is not enabled.

For both profiling and ETW, you can certainly get more info -

For profiling, ICorProfilerCallback2::GarbageCollectionStarted/Finished tells you exactly when a GC starts and finishes and with a bunch of other info - the generation it collects, generation bounds and etc.

For ETW, it's a low level mechanism so by design it does not give you statistics (unless of course you are firing stats info in your events). That's why we have tools that build on top of this mechanism to give you stats. Take a look at TraceEvent and PerfView which let you capture/interpret the events (in managed code) and look at stats.

Let me know if these still do not solve your problems.

discostu105 commented 8 years ago

Although ICorProfilerCallback2::GarbageCollectionStarted/Finished would provide the information I'm looking for, I know for a fact that it causes too much overhead. It needs to have a certain profiler capability enabled (_COR_PRF_MONITORGC). Just by enabling it, Concurrent GC is disabled for example. Since I'm looking for a low-overhead solution, this is not a (good) option for me.

Too bad, _COR_GCSTATS does not fit, it looked so good :). Thanks for your info by the way. Also, I did not know that perfcounters don't provide good values when background GC is active, so thanks for that as well.

TraceEvent looks like a nice managed library for certain ETW event sources (also CLR event's I am looking for). Maybe we can extract what we need from there. Does it use private logger sessions? Does it apply to all other limitations of ETW (mentioned in my initial post)?

We did some fair research on ETW, and there is an in-memory mode (real-time mode), where events can be processed from memory and do not need to go to a file (which is great). But unfortunately such real-time mode session can only be made system-wide and there is a limitation of 64 sessions per system (what if I want to monitor more than 64 .net processes?). Unfortunately, that mode is not combineable with private logger sessions which on the other hand does not require special permissions, and where you can make isolated sessions for each process, and there is no 64 session limit. I'm starting to get the feeling that ETW was not exactly made for my requirements :).

vancem commented 8 years ago

@brianrob

We have some tentative plans to wire all current ETW events back into managed code so that they look like and EventSource. The idea is that events like GCStats would be available to you in-proc.

However this does not help you today. As you have discovered, ETW does not really work well for 'self-monitoring' scenarios.

Another possibility is that it is also reasonable for use to have a profiler API that gets called any time an ETW event happens. This does not exist, but is reasonably straightforward addition. With this you could use the profiler API to get at the information you want in-proc.

Both of these options requires new code to be written in coreclr, but that is what pull requests are about. If you want to pursue either of these there should be some discussion about the details of the design first.

Maoni0 commented 8 years ago

I have been advocating that we should have a low overhead GC profiling mode (but haven't succeeded :-) apparently this hasn't been an urgent thing to have for most of our customers who use the profiling API) - for some people it's enough to know some stats that are very cheap for the runtime to provide (eg, they don't need the root reference, or moved references and etc). And profiling has the very nice attribute of being real time. It shouldn't be much work to allow an additional mode that gives you a subset of things GC profiling currently offers (you don't have to disable concurrent GC - it was just work that hasn't been done to enable it). If you want, you are certainly welcome to open an issue for it to discuss if someone (could be yourself :)) wants work on it.

discostu105 commented 8 years ago

All in all, it seems like there are not perfect options which will help me short-term. I guess we'll need to go with ETW for now (with it's limitations). Looking forward, the ideas you mentioned are definitively interesting to explore for future versions.

@vancem

We have some tentative plans to wire all current ETW events back into managed code so that they look like and EventSource. The idea is that events like GCStats would be available to you in-proc.

That certainly sounds interesting. My profiler has native+managed parts, so I could also consume stuff managed. Though, I have a slight preference for consuming such an API natively, because pure native threads are not subject to GC suspensions themself, so they would be (a little bit) more deterministic.

Another possibility is that it is also reasonable for use to have a profiler API that gets called any time an ETW event happens. This does not exist, but is reasonably straightforward addition. With this you could use the profiler API to get at the information you want in-proc.

Sounds cool. In order to control overhead, some up-front filtering mechanism would be needed I guess. Given such a fine-grained filtering mechanism, this approach would be very flexible (and thus useful for my specific use-case). Also (not knowing the internals of ETW) my naive mind asks: where do the buffers live and how large will they be? Would this still go through Windows Kernel?

@Maoni0

I have been advocating that we should have a low overhead GC profiling mode (but haven't succeeded :-) apparently this hasn't been an urgent thing to have for most of our customers who use the profiling API) - for some people it's enough to know some stats that are very cheap for the runtime to provide (eg, they don't need the root reference, or moved references and etc). And profiling has the very nice attribute of being real time. It shouldn't be much work to allow an additional mode that gives you a subset of things GC profiling currently offers (you don't have to disable concurrent GC - it was just work that hasn't been done to enable it). If you want, you are certainly welcome to open an issue for it to discuss if someone (could be yourself :)) wants work on it.

We've actually used those detailed infos that the Profiling API provides (root references, moved references, ...), but for a very different use-case. In that use-case we would only enable COR_PRF_MONITOR_GC for a short period of time, then disable it again (which actually works since some .NET version - thanks for that :)).

But the use-case I am aiming for right now, is to know only the very basic metrics (gen x collection count, duration and as well size of all heaps). I just want to report these metrics periodically (every X seconds), that's it. So, some API which just gives me a struct with those values would be sufficient.

One question by the way: If any of those features discussed here would be implemented for CoreCLR, would it flow back to Full CLR and be released with some future version? Is it basically the same code with different ifdef definitions or is there a very separate repository for it?

Maoni0 commented 8 years ago

Whether a feature flows back to full clr is determined case by case - it's all based on how much value it can bring to full clr customers and how much work it takes to flow back.

mleenhardt commented 6 years ago

I wanted to bump this and ask whether anything had changed on this topic? Specifically, is there any way to get GC stats from within the process? My environment is .NET Core running on Linux.

I'm aware of GC.CollectionCount which does provide some info. I was however interested in collecting metrics on the amount of time and percentage of total CPU time spent in the GC. These seem to be really important data points that you'd want to be able to monitor over time when working on performance sensitive applications and I was surprised to not have this available.

My current plan is to have a side care process running a profiling session and reporting various metrics (using perf and LTTng) about the monitored process. Is there an easier way that I missed to go about doing this? Being used to simply querying Performance counters on Windows, I really wish there was.

brianrob commented 6 years ago

@lmeenhardt, using perf and/or LTTng is a reasonable way to go about this, especially if you want very detailed information. This has been done before by others as well. There are APIs to get the collection count as well as the amount of memory that has been allocated on the current thread, but if you want data per-GC then LTTng is likely your best bet. Note that if you don't care about collecting CPU and/or blocked time stacks then you just need LTTng. It can give you all of the runtime events as well as kernel events without stacks.

Also, as @vancem noted above, we are moving towards making all of the runtime events, including GC events, available in-process through EventListeners. There has been much work done on the internals of the runtime and logging infrastructure to make this possible, but we are not there yet. You can read some about the plans at https://github.com/dotnet/designs/blob/master/accepted/cross-platform-performance-monitoring.md.

cesna commented 6 years ago

@mleenhardt did your plan worked? (side care process with LTTng) Or maybe you found an easier solution? I am facing same challenge now. Looking for ways to get GC "perf counters" on Linux.

mleenhardt commented 6 years ago

Haven't gotten around working on this yet but unless there is now something better, that's still the plan.

brianrob commented 6 years ago

The work I described above around exposing runtime data through EventListener is in-progress and so hopefully by the time you're ready it will be as well. I don't have an ETA yet though.

ocoster-ff commented 6 years ago

I have a need for a similar feature as @mleenhardt - I will need to collect different GC stats in-proc in a cross platform manner (so a side-car LTTng process won't quite cut it...).

Maoni0 commented 6 years ago

can you tell me how you would use the CPU time spent in GC if it was available? obviously, some of this CPU time was consumed while the user threads were running while the rest wasn't. if you can tell me the usage scenario it would be helpful to understand if the CPU time spent in GC is the right thing to provide.

kalikin commented 4 years ago

In our native (ICorProfilerCallback) profiler we want to collect stats, like GC count and generation sizes, in real time. It seems, the easiest way to do that, is to enable COR_PRF_MONITOR_GC or COR_PRF_HIGH_BASIC_GC flags and call GetGenerationBounds in GarbageCollectionFinished callback.

I am concerned that permanently enabled COR_PRF_MONITOR_GC flag may introduce significant overhead on runtimes where the newer flag is not supported (https://github.com/dotnet/coreclr/pull/22866). After reading comments here I assumed, that ETW can be a good alternative. But I heard that some ETW events also have overhead.

My question is does enabling ETW session and listening events hurt CLR performance in any way, or it's overhead on the runtime is much lower than of COR_PRF_MONITOR_GC. I'm only interested in GCStart/Stop, GCHeapStats; real time mode.

brianrob commented 4 years ago

If you just capture GC events, then I would expect that the overhead of the ETW events is going to be much lower than using the profiler API, which is much more invasive in how it impacts runtime execution. The cost of ETW events on the process is directly proportional to how many events are emitted and the cost of producing the payload to be emitted. These GC events are quite low in verbosity, and the arguments don't cost much to produce. There are others, such as the GC heap survival and movement events that are much more verbose, and also extend GC pauses. This would be an example of an event that has a much higher impact. But for what you're trying to do, expect the overhead to be minimal.

kalikin commented 4 years ago

Is it possible, using native profiler API or ETW, get the measurement, which corresponds to the # Total committed Bytes performance counter?

Displays the amount of virtual memory, in bytes, currently committed by the garbage collector.
Committed memory is the physical memory for which space has been reserved in the disk paging file.

As I understand, this value represents heap memory allocated by the GC for all generations. That's what we would like to show in our memory chart together with used heap memory. It seems that neither GCHeapStats event in ETW nor GetGenerationBounds provides this information. Thanks in advance.

Maoni0 commented 4 years ago

there is something in the ETW event that tells you this info, it's called ExtraGen0Commit in the per heap history event. this is how much is committed on top of the heap size info you already get (ie, on the GCHeapStats). please note that in .NET 5 we may decommit memory while the user threads are running. so this was the data at the end of a GC, you can think of it as the peak committed usage before that GC happened.

if you are using .NET 5 there's another way to get this data which is the TotalCommittedBytes property of the GCMemoryInfo struct. You can get this by calling the GC.GetGCMemoryInfo API but only the 5.0 version provides this info.

mangod9 commented 4 years ago

@discostu105 @kalikin does the new property on GCMemoryInfo help with your requirements?

kalikin commented 4 years ago

@mangod9 I haven't researched this feature yet, because it is a managed API and, aside from a few instrumentations, our profiler is unmanaged. And I'm not sure how and when to start a managed thread from a profiler - is there an official documentation on that topic?

jonso4 commented 3 years ago

It looks like there is a now a way to get GC events in-process through the new runtime event source. However, how can I get the duration of a GC? I can't seem to find a way to correlate the GCEnd events and the GCStart events. I would like to get "percent time in GC" like the counter that is available in .NET Framework.

WeihanLi commented 3 years ago

Maybe the runtime GC events would help, see the docs for details https://docs.microsoft.com/en-us/dotnet/fundamentals/diagnostics/runtime-garbage-collection-events, and you can take this project as an example.

Maoni0 commented 3 years ago

I would like to get "percent time in GC" like the counter that is available in .NET Framework.

@jonso4 there are various ways you can get it -

TraceEvent calculates the GC pause duration for each GC. and you'll need to then calculate the % yourself.
GetGCMemoryInfo (the 5.0 version) has a PauseTimePercentage property. this is the closest to the "% time in GC" counter but more accurate for BGC.

if you can't use either you'll need to calculate it yourself - the event sequence is described here and you can look at TraceEvent src to see how it calculate it. for blocking GCs this is trivial; for BGCs not so much.

dotnet / runtime

Profiling: How to get GC Metrics in-process #5638