dotnet / diagnostics

This repository contains the source code for various .NET Core runtime diagnostic tools and documents.
MIT License
1.18k stars 355 forks source link

Diagnostic support for NonGC heap #4156

Open jkotas opened 2 years ago

jkotas commented 2 years ago

[Jan had different text here before, but I rewrote it as the situation changed]

Background

Although the Frozen Object Heap has existed in the runtime for a long time, it has never been used for any prominent scenario and it has had no investment to integrate well with diagnostic tools. Now that we want to use it automatically for string literals we can't continue to ignore it for diagnostics. Context: https://github.com/dotnet/runtime/pull/49576#issuecomment-1250092961 Also, VM/JIT related work items are tracked in https://github.com/dotnet/runtime/issues/76151

The design

Several possible conceptual designs were discussed and we landed on disclosing the Frozen Object Heap as a new memory region that holds managed objects, but is distinct from the GC entirely. It has also been proposed that we stop calling it the Frozen Object Heap because it does support dynamic allocations at runtime. The current suggestion is to call it the Non-GC heap. This would give us a conceptual diagram like this:

                      .NET Managed Heap
                             |
                  -------------------------
                  |                       |
              GC heap                  Non-GC heap
                  |
         -------------------
         |        |        |        
        SOH      POH      LOH

A consequence of this decision is that previously terms 'GC heap', 'Managed heap', '.NET heap', '.NET Managed heap' all meant the same thing and that is no longer true. Names that reference 'GC' are intended to exclude the Non-GC heap and all the other names that do not mention 'GC' are generic and include both. Some concrete implications:

There is long history of treating the term "GC" as the heap containing all possible .NET objects so it is likely that users, docs, and some tools may continue to use the term that way even though it is no longer precise. We felt the potential confusion caused by that is acceptable. For the vast majority of scenarios the Non-GC heap will be comparatively small and the size of the GC heap is still a good approximation for the total managed heap. We will encourage tooling vendors for managed memory analysis tools to update their data reporting so for developers that do care about the details they will have an accurate representation.

Work needed to support the Non-GC heap:

Must have - changes to runtime APIs and tools

Must have - Notify others in this area so that they can update tools and APIs if needed. Some of these may be no-op

Must have - Notify customers of the breaking change in GC.GetGeneration() + conceptual changes

Nice to have - additional diagnostic features

category:testing theme:testing skill-level:intermediate cost:medium impact:small

dotnet-issue-labeler[bot] commented 2 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost commented 2 years ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details
- Call allocate object callback when a frozen string literal is allocated - Report frozen segments from `GetGenerationBounds` profiler interface Context: https://github.com/dotnet/runtime/pull/49576#issuecomment-1250092961
Author: jkotas
Assignees: -
Labels: `area-CodeGen-coreclr`
Milestone: -
noahfalk commented 1 year ago

@jkotas @Maoni0 - I understand that you both talked offline and agreed on this new path that Frozen Object Heap will not be included conceptually anywhere that we have named things as 'GC'. I updated the text above to capture that and updated impact on diagnostics accordingly. If you have any issues I want to get those resolved.

jkotas commented 1 year ago

GCSampledObjectAllocation masks referencing allocation (COR_PRF_MONITOR_OBJECT_ALLOCATED, COR_PRF_ENABLE_OBJECT_ALLOCATED) should apply to all heaps.

How is this expected to work for file-backed non-GC heap segments? I think it would be more future proof to not produce fine grained allocation events for non-GC heap segments and treat these allocations as other non-GC managed memory allocations where we do not provide fine-grained events eithers.

Maoni0 commented 1 year ago

this looks great, thanks for writing this up, @noahfalk!

!eeheap -gc does include non-GC heap segments currently, just because they are threaded onto gen2 (for GC bookkeeping purposes for segments; for regions I'm thinking of making that a no op so it wouldn't show up in sos by accident and instead will be tracked on the VM side which has this knowledge anyway).

I haven't used WPA for any managed analysis so I'm unclear what will need to be updated there.

I'm fine with @jkotas's suggestion above for not producing a fine grained allocation events for non-GC heap segments.

noahfalk commented 1 year ago

How is this expected to work for file-backed non-GC heap segments?

I was assuming that when the time came to support that scenario (which didn't appear to be now) we would add an alternate callback function that was more appropriate and tools that cared about having complete information would need register for that new callback as well. For example that callback might be 'ModuleLoadBulkAllocation' and it identifies a range of memory or a set of ranges. Do you think that would be problematic? My goal was to avoid regressions in the overall scenario. In .NET 7 those string allocations are landing on the gen2 GC heap and users have visibility. It would be disappointing (but not horrible) to say diagnostic tooling in .NET 8 is less capable because of runtime implementation changes.

!eeheap -gc does include non-GC heap segments currently ... for regions I'm thinking of making that a no op so it wouldn't show up in sos by accident

Thanks! Will there be an option to run without regions in .NET 8? If so then I think we should add a work item to the list that SOS needs to explicitly filter them out.

jkotas commented 1 year ago

I was assuming that when the time came to support that scenario (which didn't appear to be now)

We do not have general support for it. We support it for selected customers.

For example that callback might be 'ModuleLoadBulkAllocation' and it identifies a range of memory or a set of ranges.

I agree that we should have APIs/events that allow you to get the regions of memory where the non-GC managed objects are. Can the same API/events handle both dynamically created and file backed non-GC managed segments?

noahfalk commented 1 year ago

Can the same API/events handle both dynamically created and file backed non-GC managed segments?

We could make a callback that told tools when segments were created regardless of what kind they were, but I don't view that as a substitute for the object allocation callbacks. I expect tool authors will want to help users reason about why the memory was allocated. For dynamic segments that would mean individual object allocation callbacks, and for file backed segments it would be correlating it to a particular module load. That might end with APIs like:

//This already exists
ICorProfilerCallback::ObjectAllocated(ObjectID)

// New API I am not proposing now, but we could add it in the future
// we could decide whether this callback fires for all segments, for non-GC segments only, or
// for file-backed non-gc segments only.
ICorProfilerCallbackXX:SegmentAllocated(SegmentID, int size, ModuleID associatedModuleLoad)
leculver commented 1 year ago

I checked the SOS and ClrMD boxes.

As of now, SOS (in the main branch of dotnet/diagnostics) should fully support frozen objects in all commands. Feel free to raise a specific issue there if a command is misbehaving. For example, !dumpheap will enumerate those objects, !eeheap reports frozen segments, !gcwhere will report frozen objects, etc.

EgorBo commented 1 year ago

I checked the SOS and ClrMD boxes.

As of now, SOS (in the main branch of dotnet/diagnostics) should fully support frozen objects in all commands. Feel free to raise a specific issue there if a command is misbehaving. For example, !dumpheap will enumerate those objects, !eeheap reports frozen segments, !gcwhere will report frozen objects, etc.

Awesome! Thanks!

tommcdon commented 1 year ago

Transferring to the diagnostics work to track the remaining documentation and communication work which can be done out of band from runtime work