dotnet / aspire

An opinionated, cloud ready stack for building observable, production ready, distributed applications in .NET
https://learn.microsoft.com/dotnet/aspire
MIT License
3.7k stars 421 forks source link

Memory Leak When Enabling Health Checks in Aspire.Confluent.Kafka #6035

Open duskembayev opened 4 hours ago

duskembayev commented 4 hours ago

Is there an existing issue for this?

Describe the bug

When enabling health checks in the Aspire.Confluent.Kafka component, a memory leak is observed. The issue seems to stem from the following:

This behavior leads to a continuous increase in memory usage, especially in environments with frequent health checks.

Expected Behavior

I believe that KafkaHealthCheck should be registered in ServiceCollection as a singleton. This change would allow instances of KafkaHealthCheck to be reused across multiple health check iterations, preventing the memory leak by avoiding unnecessary object creation and ensuring proper disposal.

Steps To Reproduce

  1. Enable health checks in Aspire.Confluent.Kafka.
  2. Trigger health checks repeatedly.
  3. Monitor memory usage.

Exceptions (if any)

No response

.NET Version info

.NET SDK: Version: 8.0.400 Commit: 36fe6dda56 Workload version: 8.0.400-manifests.f51a3a6b MSBuild version: 17.11.3+0c8610977

Runtime Environment: OS Name: Windows OS Version: 10.0.22631 OS Platform: Windows RID: win-x64 Base Path: C:\Program Files\dotnet\sdk\8.0.400\

.NET workloads installed: Configured to use loose manifests when installing new manifests. [aspire] Installation Source: SDK 8.0.400, VS 17.11.35303.130 Manifest Version: 8.2.0/8.0.100 Manifest Path: C:\Program Files\dotnet\sdk-manifests\8.0.100\microsoft.net.sdk.aspire\8.2.0\WorkloadManifest.json Install Type: FileBased

Host: Version: 8.0.8 Architecture: x64 Commit: 08338fcaa5

.NET SDKs installed: 8.0.206 [C:\Program Files\dotnet\sdk] 8.0.400 [C:\Program Files\dotnet\sdk]

.NET runtimes installed: Microsoft.AspNetCore.App 8.0.6 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 8.0.8 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.NETCore.App 8.0.6 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 8.0.8 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.WindowsDesktop.App 8.0.6 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 8.0.8 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]

Other architectures found: x86 [C:\Program Files (x86)\dotnet] registered at [HKLM\SOFTWARE\dotnet\Setup\InstalledVersions\x86\InstallLocation]

Environment variables: Not set

global.json file: Not found

Learn more: https://aka.ms/dotnet/info

Download .NET: https://aka.ms/dotnet/download

Anything else?

image

No response

davidfowl commented 3 hours ago

Do you have a memory dump or trace to verify some of the ideas you have around what is causing the leak?

duskembayev commented 3 hours ago

I can try to collect the dump, but I already implemented the workaround and it helped me:

    public static void AddKafkaHealthCheck(this IHostApplicationBuilder builder, string connectionName = "Kafka")
    {
        builder.Services
               .AddOptions<KafkaHealthCheckOptions>()
               .Configure<IConfiguration>(
                   (options, cfg) =>
                   {
                       IConfigurationSection configSection = cfg.GetSection("Aspire:Confluent:Kafka:Producer:Config");
                       ProducerConfig config = new();
                       configSection.Bind(config);

                       config.BootstrapServers = cfg.GetConnectionString(connectionName);

                       options.Configuration = config;
                       options.Configuration.SocketTimeoutMs = 1000;
                       options.Configuration.MessageTimeoutMs = 1000;
                       options.Configuration.StatisticsIntervalMs = 0;
                   });
        // KafkaHealthCheck directly depends on KafkaHealthCheckOptions, ignoring IOptions pattern.
        builder.Services
               .AddTransient<KafkaHealthCheckOptions>(provider => provider.GetRequiredService<IOptions<KafkaHealthCheckOptions>>().Value);

        builder.Services.TryAddSingleton<KafkaHealthCheck>();

        builder.Services.AddHealthChecks()
               .AddCheck<KafkaHealthCheck>($"Aspire.Confluent.Kafka.Producer_{connectionName}");
    }
davidfowl commented 3 hours ago

I can try to collect the dump, but I already implemented the workaround and it helped me:

Great, a dump would be great!

Also can you file this issue on https://github.com/Xabaril/AspNetCore.Diagnostics.HealthChecks?

duskembayev commented 3 hours ago

Interesting. In Xabaril implementation this issue will not happen. Because the healthcheck is registered as singleton and DI is used to instantiate it:

        builder.Services.AddSingleton(_ => new KafkaHealthCheck(new KafkaHealthCheckOptions { Configuration = config, Topic = topic }));

        return builder.Add(new HealthCheckRegistration(
            name ?? NAME,
            sp => sp.GetRequiredService<KafkaHealthCheck>() // <--- resolving health check from DI,
            failureStatus,
            tags,
            timeout));

So, it is an Aspire-specific issue.

davidfowl commented 3 hours ago

Got it!

duskembayev commented 3 hours ago

@davidfowl I want to prepare a PR to fix this issue tomorrow.