dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.47k stars 4.76k forks source link

Higher Memory Usage for the EventSource Class #99816

Open johnwallsmsft opened 8 months ago

johnwallsmsft commented 8 months ago

Description

During a recent attempt to optimize the memory usage of an application, we noticed what seems to be unnecessary extra memory usage by the EventSource class. It looks like the EventSource class generates an array of System.Diagnostics.Tracing.EventSource EventMetadata elements on startup. This array is the length of the largest EventId specified in the EventSource class, so we're seeing much higher peak memory usage (~17 MB) when using an EventID of 60000 versus an EventID of 1.

If we have two events, peak memory usage also changes significantly depending on whether we define the event methods in the class in decreasing or increasing order by EventID.

Configuration

This is using .Net 8 on Windows 11. But, it behaves the same with .Net 6 and .Net Framework 4.7.2.

Data

Here is a simple repro app that demonstrates the behavior: `

class Program
{
    static void Main(string[] args)
    {
        Console.WriteLine("Hello, World!");

        if(args.Length != 1)
        {
            Console.WriteLine("Usage: EventSourceMemUsage.exe <LowIDs/HighIDs/DecreasingIDs>");
        }

        switch (args[0])
        {
            case "LowIDs":
                TestLowIDsEventSource.Log.EventWriteInfo("This is an informational message");
                TestLowIDsEventSource.Log.EventWriteError("This is an informational message");
                break;
            case "HighIDs":
                TestHighIDsEventSource.Log.EventWriteInfo("This is an informational message");
                TestHighIDsEventSource.Log.EventWriteError("This is an informational message");
                break;
            case "DecreasingIDs":
                TestDecreasingEventSource.Log.EventWriteInfo("This is an informational message");
                TestDecreasingEventSource.Log.EventWriteError("This is an informational message");
                break;
            default:
                Console.WriteLine("Please enter one of LowIDs/HighIDs/DecreasingIDs");
                Environment.Exit(-1);
                break;
        }

        Console.ReadLine();
    }
}

[EventSource(Name = "Application Error", Guid = "a0e9b465-b939-57d7-b27d-95d8e925ff57")]
public sealed class TestLowIDsEventSource : EventSource
{
    public static TestLowIDsEventSource Log = new TestLowIDsEventSource();

    [Event(1, Level = EventLevel.Informational, Channel = EventChannel.Admin)]
    public void EventWriteInfo(string Message) => WriteEvent(1, Message);

    [Event(2, Level = EventLevel.Error, Channel = EventChannel.Admin)]
    public void EventWriteError(string Message) => WriteEvent(2, Message);
}

[EventSource(Name = "Application Error", Guid = "a0e9b465-b939-57d7-b27d-95d8e925ff57")]
public sealed class TestHighIDsEventSource : EventSource
{
    public static TestHighIDsEventSource Log = new TestHighIDsEventSource();

    [Event(60000)]
    public void EventWriteInfo(string Message) => WriteEvent(60000, Message);

    [Event(60001)]
    public void EventWriteError(string Message) => WriteEvent(60001, Message);
}

[EventSource(Name = "Application Error", Guid = "a0e9b465-b939-57d7-b27d-95d8e925ff57")]
public sealed class TestDecreasingEventSource : EventSource
{
    public static TestDecreasingEventSource Log = new TestDecreasingEventSource();

    [Event(60001)]
    public void EventWriteError(string Message) => WriteEvent(60001, Message);

    [Event(60000)]
    public void EventWriteInfo(string Message) => WriteEvent(60000, Message);
}

`

Using the first class (TestLowIDsEventSource), we see the following memory usage in ProcExp:

LowIDs

Using the second class (TestHighIDsEventSource), we see the following much increased memory usage:

HighIDs

Using the third class (TestDecreasingEventSource), we see much lower usage than TestHighIDsEventSource, but still noticeably higher than TestLowIDsEventSource:

Decreasing

Analysis

For the higher memory usage when using EventID 60000 versus Event ID 1, this looks to be from here: https://github.com/dotnet/runtime/blob/689cf795574806929be152cdcb3aadcd632be544/src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/EventSource.cs#L3433.

I saw mention in another issue that the reason for allocating the whole table is to increases speed when actually writing events, which makes sense. But, I would think that this should not be an array of full size EventMetadata objects when the vast majority of them go unused. Perhaps an array of pointers to a second, much smaller, table of EventMetadata objects with the length of the actual number of events.

For the second issue, where the EventID ordering changes the memory usage, I believe it stems from here: https://github.com/dotnet/runtime/blob/689cf795574806929be152cdcb3aadcd632be544/src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/EventSource.cs#L3431 The AddEventDescriptor is currently called sequentially down the method list. So, we first allocate an array of size 60000. Then on the next method, we need to allocate a new array of 60001 and copy the first array over. It may be more efficient memory wise to check for the max EventID first before allocating any of these arrays, although that could be slightly slower.

dotnet-policy-service[bot] commented 8 months ago

Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti See info in area-owners.md if you want to be subscribed.

NickCraver commented 8 months ago

When digging on this, I came across #4458 along the way - adding here for full context.

davmason commented 4 months ago

As Vance mentioned in https://github.com/dotnet/runtime/issues/4458#issuecomment-136402993, this is a known issue and fixing it would cause a performance hit for all users of EventSource. Our advice is to not use sparse event IDs, and instead start from 1 and go up.

Closing this as won't fix, if there is case where you have to use high event IDs please let me know and we can revisit.

karlreingit commented 2 months ago

Closing this as won't fix, if there is case where you have to use high event IDs please let me know and we can revisit.

I would describe our scenario as one where we are migrating 10+ year old legacy manifest-based event providers to EventSource, and we have a back-compact requirement that the event IDs do not change as they are used for monitoring/alerting a large cloud service, and in some cases the event IDs are well known to external customers via on-prem offering.

So our first issue is simply needing to preserve existing event IDs when migrating to EventSource.

I also suspect that we have arbitrary gaps in the event ID range - that it is not a continuous sequence of IDs - due to some events being removed over the years.

Hope this additional context is useful.

NickCraver commented 2 months ago

(Dang, Karl replied before I could - thanks for the reopen!)

Given both the startup performance impact, memory usage, and the relatively small part of the eventing pipeline this lookup involves, is a Dictionary worth considering here?

In @johnwallsmsft's example starting this issue (which is actually realistic, we run that case on approximately 2,662,539 workers at the moment - many of which have < 2GB of memory. This may seem like an acceptable tradeoff if it was 1 executable on a machine, but these 10MB+ usages really add up with several services adding together, contributing to more usage, less headroom, increased GC runs for customers, etc. I think this is worth trimming down.

davmason commented 2 months ago

Moving this to the 10.0 milestone now that it is reopened, and un-assigning myself so @tommcdon can route as appropriate.