dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.04k stars 4.68k forks source link

`HttpMetricsEnrichmentContext.AddCallback` seems to sometimes add duplicate tags #108284

Open lindeberg opened 4 days ago

lindeberg commented 4 days ago

Description

When using HttpMetricsEnrichmentContext.AddCallback, it sometimes seems to add duplicate tags to a metric.

For a metric, I got multiple amount of different amount of duplicates. For example, with prometheus exposed metrics, notice the custom label gateway_api:

http_client_request_duration_seconds_bucket{gateway_api="crm",http_request_method="GET",http_response_status_code="200",network_protocol_version="1.1",server_address="whatever.com",url_scheme="https",le="0.1"} 4234 1727271043861

http_client_request_duration_seconds_bucket{gateway_api="crm",gateway_api="crm",http_request_method="GET",http_response_status_code="200",network_protocol_version="1.1",server_address="whatever.com",url_scheme="https",le="0.1"} 108 1727271043861

http_client_request_duration_seconds_bucket{gateway_api="crm",gateway_api="crm",gateway_api="crm",http_request_method="GET",http_response_status_code="200",network_protocol_version="1.1",server_address="whatever.com",url_scheme="https",le="0.1"} 8 1727271043861

http_client_request_duration_seconds_bucket{gateway_api="crm",gateway_api="crm",gateway_api="crm",gateway_api="crm",http_request_method="GET",http_response_status_code="200",network_protocol_version="1.1",server_address="whatever.com",url_scheme="https",le="0.1"} 1 1727271043861

Which causes Prometheus scraper to stop working with error message label name "gateway_api" is not unique: invalid sample.

Reproduction Steps

I'm unable to reproduce it locally, but it happened in the production environment which has a high request rate.

public sealed class HttpClientMetricsEnrichmentHandler : DelegatingHandler
{
    protected override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
    {
        HttpMetricsEnrichmentContext.AddCallback(request, context => context.AddCustomTag("gateway-api", "crm"););

        return base.SendAsync(request, cancellationToken);
    }
}
// While writing this, I realized the scope may be the problem somehow? 🦆
services.AddScoped<HttpClientMetricsEnrichmentHandler>();
services.AddHttpClient<ICrmClient, CrmClient>().AddHttpMessageHandler<HttpClientMetricsEnrichmentHandler>(); //Etc.

Followed by usage of the ICrmClient in the API.

Expected behavior

The custom tag should only be added once, I don't know how exactly. But maybe a duplicate check should be included?

Actual behavior

The custom tag is added multiple times, creating duplicate tags, which causes an error in Prometheus.

Regression?

No response

Known Workarounds

No response

Configuration

mcr.microsoft.com/dotnet/aspnet:8.0.8-alpine

Other information

No response

dotnet-policy-service[bot] commented 4 days ago

Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.

CarnaViire commented 4 days ago

// While writing this, I realized the scope may be the problem somehow? 🦆

I can't yet comment on whether this could be the cause of the duplication, but note that the handler used in AddHttpMessageHandler<THandler> must be registered as a Transient, see the Type Parameters section in the API docs.

antonfirsov commented 4 days ago

Without a repro or some other source of actionable data, it is not possible to determine if the bug lives in System.Net.Http, the application code, or some library your app is using.

// While writing this, I realized the scope may be the problem somehow? 🦆

This looks suspicious, can you please share more context and all the relevant code, including service registration CrmClient etc.

Edit: Natalia replied faster 😄

lindeberg commented 4 days ago

Thank you! We can start by trying out registering the Message Handler correctly as transient.