Closed alexrosenfeld10 closed 7 months ago
Could you please share with us how do you use Polly?
From the stack trace we can only see that you are using with HttpClient
.
@peter-csala thanks - see below:
public static void AddResilientVimeoClient(this IServiceCollection serviceCollection) =>
serviceCollection.AddHttpClient<IVimeoClient, VimeoClient>(client =>
{
client.BaseAddress = new Uri("https://api.vimeo.com");
client.DefaultRequestHeaders.Add("Accept", "application/vnd.vimeo.*+json;version=3.4");
})
.AddResilienceHandler(nameof(VimeoClient), (pipelineBuilder, context) =>
{
var logger = context.ServiceProvider.GetRequiredService<ILogger<IVimeoClient>>();
pipelineBuilder
.AddRetry(new RetryStrategyOptions<HttpResponseMessage>
{
MaxRetryAttempts = 5,
Delay = TimeSpan.FromSeconds(1),
BackoffType = DelayBackoffType.Exponential,
UseJitter = true,
ShouldHandle = new PredicateBuilder<HttpResponseMessage>()
.Handle<HttpRequestException>(exception =>
exception.Message.StartsWith("Resource temporarily unavailable")),
OnRetry = arguments =>
{
logger.LogWarning(
arguments.Outcome.Exception,
"An HttpRequestException was encountered, delaying for {Delay}ms before making retry attempt {RetryCount}",
arguments.RetryDelay.TotalMilliseconds,
arguments.AttemptNumber);
return ValueTask.CompletedTask;
}
})
.AddRetry(new HttpRetryStrategyOptions
{
MaxRetryAttempts = 5,
Delay = TimeSpan.FromSeconds(1),
BackoffType = DelayBackoffType.Exponential,
UseJitter = true,
ShouldHandle = async args =>
{
if (args.Outcome.Result is null)
{
return false;
}
var httpResponseMessage = args.Outcome.Result;
var responseBody = await httpResponseMessage.Content.ReadAsStringAsync();
return httpResponseMessage.StatusCode == HttpStatusCode.InternalServerError &&
(responseBody.Contains("Whoops!", StringComparison.OrdinalIgnoreCase) ||
responseBody.Contains("Please try again", StringComparison.OrdinalIgnoreCase));
},
OnRetry = arguments =>
{
logger.LogWarning(
arguments.Outcome.Exception,
"A potentially retryable 500 Internal Server Error was encountered, delaying for {Delay}ms before making retry attempt {RetryCount}",
arguments.RetryDelay.TotalMilliseconds,
arguments.AttemptNumber);
return ValueTask.CompletedTask;
}
})
.AddRetry(new HttpRetryStrategyOptions
{
MaxRetryAttempts = 5,
Delay = TimeSpan.FromSeconds(1),
BackoffType = DelayBackoffType.Exponential,
UseJitter = true,
ShouldHandle = async args =>
{
if (args.Outcome.Result is null)
{
return false;
}
var httpResponseMessage = args.Outcome.Result;
var responseBody = await httpResponseMessage.Content.ReadAsStringAsync();
return httpResponseMessage.StatusCode == HttpStatusCode.ServiceUnavailable &&
(responseBody.Contains("fastly", StringComparison.OrdinalIgnoreCase) ||
responseBody.Contains("Please try again soon", StringComparison.OrdinalIgnoreCase));
},
OnRetry = arguments =>
{
logger.LogWarning(
arguments.Outcome.Exception,
"A 503 Service Unavailable was encountered, delaying for {Delay}ms before making retry attempt {RetryCount}",
arguments.RetryDelay.TotalMilliseconds,
arguments.AttemptNumber);
return ValueTask.CompletedTask;
}
})
.AddRetry(new HttpRetryStrategyOptions
{
MaxRetryAttempts = 5,
Delay = TimeSpan.FromSeconds(1),
BackoffType = DelayBackoffType.Exponential,
UseJitter = true,
ShouldHandle = args =>
{
if (args.Outcome.Result is null)
{
return ValueTask.FromResult(false);
}
var httpResponseMessage = args.Outcome.Result;
var isTextTrackEndpoint = httpResponseMessage
.RequestMessage?
.RequestUri?
.AbsoluteUri
.Contains("texttracks", StringComparison.OrdinalIgnoreCase) ?? false;
return ValueTask.FromResult(httpResponseMessage.StatusCode == HttpStatusCode.NotFound &&
isTextTrackEndpoint);
},
OnRetry = arguments =>
{
logger.LogWarning(
arguments.Outcome.Exception,
"A 404 Not Found on texttracks endpoint was encountered, delaying for {Delay}ms before making retry attempt {RetryCount}",
arguments.RetryDelay.TotalMilliseconds,
arguments.AttemptNumber);
return ValueTask.CompletedTask;
}
})
.AddRetry(new HttpRetryStrategyOptions
{
MaxRetryAttempts = 5,
Delay = TimeSpan.FromSeconds(1),
BackoffType = DelayBackoffType.Exponential,
UseJitter = true,
ShouldHandle = async args =>
{
if (args.Outcome.Result is null)
{
return false;
}
var httpResponseMessage = args.Outcome.Result;
var responseBody = await httpResponseMessage.Content.ReadAsStringAsync();
return httpResponseMessage.StatusCode == HttpStatusCode.BadGateway &&
responseBody.Contains("server encountered a temporary error",
StringComparison.OrdinalIgnoreCase);
},
OnRetry = arguments =>
{
logger.LogWarning(
arguments.Outcome.Exception,
"A 502 Bad Gateway was encountered, delaying for {Delay}ms before making retry attempt {RetryCount}",
arguments.RetryDelay.TotalMilliseconds,
arguments.AttemptNumber);
return ValueTask.CompletedTask;
}
})
;
});
@alexrosenfeld10 Thanks for sharing the code fragment. Just to confirm:
Is my understanding correct?
These strategies can retry the request almost indefinitely because they are chained together. That means
For the sake of simplicity lets assume that you have only two retry strategies.
Let me show you what can happen via a sequence diagram
sequenceDiagram
autonumber
actor C as Caller
participant P as Pipeline
participant RO as Retry Outer
participant RI as Retry Inner
participant HC as HttpClient
C->>P: Calls ExecuteAsync
P->>RO: Calls ExecuteCore
Note over RO: Counter is 0
RO->>RI: Calls ExecuteCore
Note over RI: Counter is 0
RI->>HC: Calls SendAsync
HC->>RI: Returns Failure X
RI->>RI: Delays
Note over RI: Counter is 1
RI->>HC: Calls SendAsync
HC->>RI: Returns Failure X
RI->>RI: Delays
Note over RI: Counter is 2
RI->>HC: Calls SendAsync
HC->>RI: Returns Failure Y
RI->>RO: Returns Failure Y
RO->>RO: Delays
Note over RO: Counter is 1
RO->>RI: Calls ExecuteCore
Note over RI: Counter is 0
RI->>HC: Calls SendAsync
HC->>RI: Returns Failure X
RI->>RI: Delays
Note over RI: Counter is 1
Note over RI: ...
A better solution would be to combine all these strategies into a single one. Do you need my assistance there?
@peter-csala , thanks for your thoughtful reply. Your understanding is correct.
I understand what you're saying, but the likelihood that failure X would happen 0..n
times, and then right after those times failure Y would happen another 0..n
is very unlikely. I'm not sure I'm really worried about this case.
Would this be related to the pipeline being disposed and not usable anymore? I'm not sure how / why it would be.
Well, without sufficient context it is pretty hard to tell what is causing the problem.
So, let me try to reiterate through the StackTrace (in a reverse order) and see what we have:
System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
An HTTP request is about to send
Microsoft.Extensions.Http.Logging.LoggingScopeHttpMessageHandler.<SendCoreAsync>g__Core|5_0(HttpRequestMessage request, Boolean useAsync, CancellationToken cancellationToken)
A DelegatingHandler
is invoked which is responsible for the request logging
Microsoft.Extensions.Http.Resilience.Internal.ResilienceHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
Yet another DelegatingHandler
is invoked which is responsible to invoke the ResiliencePipeline
Polly.ResiliencePipeline.ExecuteOutcomeAsync[TResult,TState](Func'3 callback, ResilienceContext context, TState state)
The pipeline is about to execute in a safe manner (no exception is thrown in case of failure rather than captured in an Outcome
object)
Polly.ResiliencePipeline.InitializeAsyncContext[TResult](ResilienceContext context)
It ensures that the pipeline is not disposed before saving some metadata about the execution context
Polly.Utils.Pipeline.ComponentDisposeHelper.EnsureNotDisposed()
It checks whether the pipeline is already disposed or not
The message handlers disposal process is quite complex. I've recently tried to detail it on this SO topic. Long story short you can change the default HandlerLifetime
(which is 2 minutes) via the SetHandlerLifetime
.
It might make sense to increase the value but as you said you have waited only 10 seconds...
@peter-csala thanks, not sure that'd be the issue, we shutdown in 10 sec. This really feels more like a bug in polly / msft http resilience extensions. I'm not sure what other context is needed here, that's my full resilience pipeline config and it happens at shutdown. What else can I provide?
@alexrosenfeld10 Just to name a few things that we don't know:
VimeoClient
is implemented
VimeoClient
Vimeo
Dispose
call in your codeObjectDisposedException
is thrownSo, to perform triaging it would be great to have a minimal reproducible example code or a unit test. Otherwise it is just educated guesses with assumption how things can go wrong.
We should take a look at ComponentDisposeHelper implementation, as ComponentDisposeHelper
might not correctly implement DisposeAsync pattern.
According to description of IAsyncDisposable.DisposeAsync
If an object's DisposeAsync method is called more than once, the object must ignore all calls after the first one and synchronously return a successfully completed ValueTask. The object must not throw an exception if its DisposeAsync method is called multiple times. Instance methods other than DisposeAsync can throw an ObjectDisposedException when resources are already disposed.
@martintmk could you please comment current implementation of ComponentDisposeHelper
on why it is neccessary to throw ObjectDisposedException
if object is disposed?
update: I'm wrong as EnsureNotDisposed
is a supplementary method and is not part of dispose pattern, and it's called on callback execution.
@peter-csala I mean no disrespect, but how are most of those questions relevant? They have almost nothing to do with the resilience pipeline, polly, or how it's being used and disposed. Perhaps I'm misunderstanding something and they actually do. Regardless, answers below:
How is the
VimeoClient
implemented
Standard stuff, all methods end up doing what you'd expect (var response = await _httpClient.SendAsync(request);
)
Which endpoint do you hit when you shut down the application
I don't understand the question. The application is shut down via kubernetes sending a TERM
signal to the pod, not by an endpoint. If you mean which Vimeo endpoint we call, we don't call a specific one on shutdown, we'd be calling endpoints related to whatever the application is currently doing.
Are you hitting the test or the prod API of Vimeo
There is no reason this would matter or be related to the problem at hand. Regardless, you can find the answer in my original post.
Is there any explicit or implicit Dispose call in your code
No - none related to http client usage, polly, or the resilience pipeline configuration.
Is it happening all the time (deterministically) or just occasionally (undeterministically)
Occasionally, on shutdown. Sometimes it doesn't happen (I am guessing these cases are when the app shuts down and there happen to be no ongoing http requests from the VimeoClient
during shutdown, but I don't know, it's a guess)
Did it get stuck in a retry loop when the ObjectDisposedException is thrown
I don't know how to answer that, all I see is the stacktrace, it's all I am given for debugging.
What is the success/failure rate of your requests
Shouldn't matter? Mostly success with some sporadic failures, which is why I'm using resilience here anyway.
If you have a single simpler retry logic then do you experience the same
Haven't tested this but again, shouldn't matter? If it does, that's clearly a design flaw in polly / resilience.
Yes, of course to some extent there is guesswork here, I don't know the inner workings of the library, that's why I'm seeking out help from maintainers. I can try to create a minimum and reproducible example, but it's difficult, these kind of problems are hard to reproduce and debug, so hopefully some knowledgable maintainers / authors can spot the issue just from reading and thinking over the code.
@gintsk thanks for taking a look, even if you didn't find the bug, glad to have eyes on this
How the VimeoClient is implemented
From the ObjectDisposedException
perspective it can be relevant whether you fetch strategies from the ResiliencePipelineProvider
/ResiliencePipelineRegistry
inside the typed client. It is called out on the documentation page that the disposal of the registry also disposes the strategies.
How do you use the VimeoClient
From the ObjectDisposedException
perspective it can be relevant whether the DI subsystem injects a VimeoClient
instance to the usage class or you create one (or many) via ITypedHttpClientFactory
. In later case the lifecycle management of the client is in your hand.
Which endpoint do you hit when you shut down the application
It is important to understand whether the exception is thrown all the time regardless of the Vimeo responses or only specific communication patterns causes the issue. Because you have many different retry strategies narrowing down the possible root causes is essential during triaging.
Are you hitting the test or the prod API of Vimeo
Integration testing environments are usually running on less powerful resources than the production system. They might host older version or even custom code (for example via feature flags).
This could result less reliable responses. You have defined many retry strategies and they might be more utilized (even during shutdown) in an integration testing environment than in the production env.
What is the success/failure rate of your requests
If your requests are mostly successful then no retry attempts are required. Basically the requests are just passing through the ResiliencePipeline.
If the failure rate is high then there could be many outstanding/pending requests while you try to shut down the application.
Did it get stuck in a retry loop when the ObjectDisposedException is thrown
I don't know how to answer that, all I see is the stacktrace, it's all I am given for debugging.
Because all your retry strategies define an OnRetry
delegate where you perform logging that's why the logs should indicate whether there was a retry loop or not prior the exception was thrown.
Occasionally, on shutdown. Sometimes it doesn't happen (I am guessing these cases are when the app shuts down and there happen to be no ongoing http requests from the VimeoClient during shutdown, but I don't know, it's a guess)
It would be great to know under what circumstance does it happen.
Yes, of course to some extent there is guesswork here, I don't know the inner workings of the library, that's why I'm seeking out help from maintainers. I can try to create a minimum and reproducible example, but it's difficult, these kind of problems are hard to reproduce and debug, so hopefully some knowledgable maintainers / authors can spot the issue just from reading and thinking over the code.
If you did not find my triaging attempt useful then please let me know. I'll step back and let other contributors/maintainers help you.
How to you wait for the 10 seconds? Is it by using IHostApplicationLifetime.ApplicationStopping
to do a Task.Delay()
or similar?
I would have thought if doing so that the framework would not dispose of the DI container until it stopped the app. If the host has disposed of the DI container, then anything currently inflight would get issues resolving services from DI if they are resolved lazily during the processing of an HTTP request.
The AddResilienceHandler()
extension isn't part of Polly, but part of the dotnet/extensions repo that in turn builds on top of Polly.
Without something that can readily reproduce the issue to debug, I don't think it's clear whether this is "by design" because the app has shutdown and the request is happening "too late", or whether this is a bug in one or more of the following things:
@martincostello Thanks, and yes, here's what it looks like:
public static IApplicationBuilder ConfigureAppShutdown(this IApplicationBuilder app,
int millisecondsTimeout = 10_000)
{
var lifetime = app.ApplicationServices.GetRequiredService<IHostApplicationLifetime>();
var logger = app.ApplicationServices.GetService<ILogger>();
lifetime.ApplicationStopping.Register(() =>
{
logger?.Warning("Stopping event SIGTERM received, waiting for 10 seconds");
Thread.Sleep(millisecondsTimeout);
logger?.Warning("Termination delay complete, continuing stopping process");
});
return app;
}
I realize Thread.Sleep
is generally the wrong answer, however I don't think it matters here (it will block the main thread, which isn't what hosted services or http requests use in .NET).
I agree - this disposal should be the responsibility of the framework.
I gather you think I should open an issue to dotnet extensions?
You could, but @martintmk works on both so he can chime in on where the issue might be, but if you did create a second issue there's a good chance you'll still get asked for a repro that can be used to definitively answer whose problem it is to resolve.
I'm happy to invest in creating a repro (haven't tried yet, happy to, just been slammed with work and sick at the same time). Let's see what Martin (you're both Martin it appears, I meant @martintmk as you mentioned) has to say here and then go forward from there.
Hey @alexrosenfeld10
You can only get ObjectDisposedException
from the pipeline when it's owner (ResiliencePipelineRegistry
) is disposed. The disposal of registry can happen only when the DI container is disposed, unless you are doing something odd such as disposing it manually before that.
My theory why this can be happening is that some service registered in the DI container is using the pipeline in it's dispose method. If the container disposes the registry before the service is disposed, it's disposal will fail.
I tried to confirm my theory, but couldn't replicate it:
var builder = Host.CreateApplicationBuilder(args);
builder.Services.AddResiliencePipeline(
"my-pipeline",
builder => builder.AddTimeout(TimeSpan.FromSeconds(1)));
builder.Services.AddSingleton<MyService>();
var host = builder.Build();
await host.StartAsync();
var myService = host.Services.GetRequiredService<MyService>();
host.Dispose(); // This passes
class MyService(ResiliencePipelineProvider<string> provider) : IDisposable
{
private readonly ResiliencePipeline pipeline = provider.GetPipeline("my-pipeline");
public void Dispose()
=> pipeline.Execute(() => Console.WriteLine("Using the pipeline when disposing"));
}
At this point we need a repro code to proceed. I suspect you might be doing something unusual that I am not able to track down. But maybe there is indeed some bug in somewhere, if so my bet is that it is outside of Polly.
@martintmk Thanks for the detailed reply. I'm using Lamar as my container, perhaps it's some interaction between the two.
btw, this was only happening when my application was restarting heavily due to some other performance issues. Since resolving those, I haven't seen this before. I'm guessing there was some "shut down immediately after starting up" or "shut down while gazillions of http requests are going" type issue that was causing this. Now that it's not happening at all anymore, I'm not planning on spending time digging into it further.
Again, thanks for the assistance here - I appreciate it, even though we didn't find the root cause, it was helpful to brainstorm a bit.
Describe the bug
When my application shuts down I wait for 10 seconds for ongoing requests, background jobs, etc. to finish up. During this time, I get the following error:
Expected behavior
The pipeline shouldn't be disposed if the application is still using it
Actual behavior
The pipeline is disposed while the application is shutting down, and ongoing requests fail
Steps to reproduce
No response
Exception(s) (if any)
No response
Polly version
8.3.0, Microsoft.Extensions.Http.Resilience 8.1.0
.NET Version
mcr.microsoft.com/dotnet/aspnet:8.0
Anything else?
No response