App-vNext / Polly

Polly is a .NET resilience and transient-fault-handling library that allows developers to express policies such as Retry, Circuit Breaker, Timeout, Bulkhead Isolation, and Fallback in a fluent and thread-safe manner. From version 6.0.1, Polly targets .NET Standard 1.1 and 2.0+.
https://www.thepollyproject.org
BSD 3-Clause "New" or "Revised" License
13.43k stars 1.23k forks source link

[Bug]: Execution attempt logged with information severity #2362

Open martintmk opened 2 weeks ago

martintmk commented 2 weeks ago

Describe the bug

Some of our service teams that switched to Polly reported increased CPU utilization. After digging for a while we found that this was caused by increased amount of logs being produced. They have set the minimum log level to Information to not overwhelm the logging infra.

What suprised me, is that Polly logs each execution attempt (Retry, Hedging) with information severity. Is there any reason doing so? Why can't we just use Debug for this by default?

Repro Code:

var services = new ServiceCollection();
services.AddLogging(builder => builder.AddConsole().SetMinimumLevel(LogLevel.Debug));

var pipeline = services
    .AddResiliencePipeline("my-pipeline", builder => builder.AddRetry(new()))
    .BuildServiceProvider()
    .GetRequiredKeyedService<ResiliencePipeline>("my-pipeline");

await pipeline.ExecuteAsync(async token => true);

Gives the following console output:

dbug: Polly[1]
      Resilience pipeline executing. Source: 'my-pipeline/(null)', Operation Key: '(null)'
info: Polly[3]
      Execution attempt. Source: 'my-pipeline/(null)/Retry', Operation Key: '', Result: 'True', Handled: 'False', Attempt: '0', Execution Time: 5.3998ms
dbug: Polly[2]
      Resilience pipeline executed. Source: 'my-pipeline/(null)', Operation Key: '', Result: 'True', Execution Time: 60.2843ms

As a workaround, they have used the SeverityProvider (#1859) to switch the default level for ExecutionAttempt to Debug.

peter-csala commented 2 weeks ago

IMHO if we want to change telemetry logs' severity then we should not focus only on the ExecutionAttempt rather we should re-evaluate for all logs.

martincostello commented 2 weeks ago

I assumed the levels were all as-intended as part of the telemetry work for v8.

I can see a case for it being either Info or Debug, so I don't really think this is a bug. I think this is more a case of an unforeseen side-effect of adding logging to Polly.

If we changed the default behaviour, we might break people who do want this information, and this is unlikely to be the sort of behaviour change people would find in their CI. It then also requires code changes to retain the previous behaviour, not just config.

Feels like the workaround is more using a feature by-design to change the behaviour.

I'd agree we should check all the logs and re-evaluate them if we do want to change, but doing so outside a major release feels wrong to me (but maybe I'm being over-cautious).

martintmk commented 2 weeks ago

Thanks folks, Martin you are right this might not really be bug.

My main point is that we should review what logs are we generating on the hot + happy path. I believe, for this category of logs, everything should be at debug level to prevent the problems mentioned in the issue info.

peter-csala commented 2 weeks ago

I think we could create an opt-in version for V8. We could extend the TelemetryOptions with a flag, like LowerLogSeverityForHighPerformance or EnablePerformanceOptimization or whatever name you like.

This flag would lower the log severity for a predefined list of telemetry log types (those that are usuallt on the hot path). The advantage of this approach compared to the direct usage of SeverityProvider is that user can easily turn this on to reduce the quantity of logs. Of course we should provide a proper documentation, but the Polly users don't have to know all the nitty gritty details.

martincostello commented 2 weeks ago

I think we could create an opt-in version for V8. We could extend the TelemetryOptions with a flag, like LowerLogSeverityForHighPerformance or EnablePerformanceOptimization or whatever name you like.

Having a "make go fast" option doesn't seem great to me, because it would inevitably attract "why isn't that the default then?" questions.

Maybe a "performance optimizations" section in the docs would be an alternative? It could cover things like customizing the severities as well as overriding the general Polly log level to Warning+ etc. I imagine most people are logging Polly at Information level via having "Default": "Information" in their application's logging configuration.

peter-csala commented 2 weeks ago

Naming is hard 😅 Less verbose or Important ones only might be better. My main idea here was to introduce a simple switch which lowers the severity of the chatty telemetry events.

Obviously it has a trade off: less logs, less visibility. I'm not sure ASP.NET Core has a similar feature, if so we could follow that pattern.

martincostello commented 2 weeks ago

I think the problem with that is that the severity is pretty subjective when it comes to info vs. debug and warn vs. error and we've already (rightly or wrongly) applied an opinion on what events are what level, so having a second opinion on what they all are baked-in feels a bit weird to me.

Maybe with hindsight we may have tweaked them slightly before shipping v8, but I think if you're really performance conscious, we've already provided ways for people to tweak that as much as they want.

If over time there's feedback from multiple users on this we could consider doing something, but I think docs is the "cheaper" solution at this stage.