Open rynowak opened 5 years ago
@fabiocav - does WebJobs dispose the service provider for a case like this? What's causing the timeout?
Think of the WebJob as a console app with a host builder configuration.
The host does not complete (and the console app exit) until the HttpClientFactory completes its cleanup.
The Azure App Service container running the WebJob expects as default to triggered jobs that it completes as default in 120secs once it goes idle, so just terminated the WebJob process as a Failure than the Success status that it should be.
@acds are you referring to a timeout waiting for the shutdown/cleanup as a result of a function execution timeout? If so, it does look like WebJobs currently disposes host, but there's a work item to address that.
Well not an Azure Function but a WebJob Trigged Function, but yes, any insights on the workaround ?
Give the WebJob v3 API is thoroughly undocumented just now....and is radically different then v2.
One workaround would be triggering a full GC a few times when your function is shutting down.
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
Another workaround would be setting the handler expiry to Timeout.InfiniteTimeSpan
as documented here: https://github.com/aspnet/Extensions/blob/557995ec322f1175d6d8a72a41713eec2d194871/src/HttpClientFactory/Http/src/DependencyInjection/HttpClientBuilderExtensions.cs#L502 If your job/service is short lived there should be no drawback to that. The reason why we rotate handlers is because DNS updates don't work reliably on all existing platforms.
Hi @rynowak thanks....and apologies I was away, and am just getting back to this.
the CG.Collect();
does to seem to make any difference.
I already have been experimenting with the Handler Expiry, setting to Infinite, seams to have the effect that the job still seems to not exit/sleep, but does not raise the HttpClientFactory cleanup diagnostics.
I currently have set to a continuous job, but want to revert to "triggered". The job is short lived, and will run every few hours. Having it deployed as "continuous" kinds defeats the object of having a triggered job.
Any other suggestions ?
Thanks!
@fabiocav & @rynowak & @Eilon - any updates?
Not sure if this is any use - in Rider I can pull up a memory dump - and notice that the HttpClients are not disposed, even though out of scope:
my default code patterns is as follows:
private async Task<System.Net.HttpStatusCode> PatchCredentials(MyType myType)
{
HttpClient client = _httpClientFactory.CreateClient("graph"); //using a named client
HttpContent content = new StringContent(SerializeCredentials.ToJson(myType), Encoding.UTF8, "application/json");
using (var httpResponse = await client.PatchAsync("", content))
{
httpResponse.EnsureSuccessStatusCode();
return httpResponse.StatusCode;
}
}
the client factory is configured as follows:
.ConfigureServices((context, services) =>
{
...
services.AddTransient<GraphAuthenticationHandler>();
services.AddHttpClient("graph", c =>
{
c.BaseAddress = new Uri($"{myURI}");
}).AddHttpMessageHandler<GraphAuthenticationHandler().SetHandlerLifetime(Timeout.InfiniteTimeSpan);
...
}
what else should I be looking at...?
Edit:
Additional info:
AzureServiceTokenProvider
in a pair of Delegating Handlers that I believe under the covers makes HttpClient requests - is this somehow messing with things ?On review of memory before I garbage collect this is the number of instances:
After the first GC.Collect();
Subsequent call to GC.Collect();
and GC.WaitForPendingFinalizers();
make no difference...
What is holding on to the HttpClients ?
Thanks for any and all help!
@acds - sorry for the delay - I've been on vacation.
The memory dump should be able to show you why those objects are still live. https://www.jetbrains.com/help/dotmemory/Shortest_Paths_to_Roots.html You want to look for the "path to root".
@faviocb - can you provide some background about what's keeping his function live?
The host does not complete (and the console app exit) until the HttpClientFactory completes its cleanup.
What is the mechanism that does this? We don't have a feature for forcing client factory cleanup so how does this work?
Hi @rynowak & @fabiocav
Here is some output from dotMemory:
Here is an Expansion of each:
QuickPulseTelemetartyModule
(2 instances)
Any thoughts? Look like possible memory leaks (or as you point out "cleanup") in the runtime around Application Insights, Key Vault Configuration Integration, and HttpClientFactory (?)
It's a chicken-and-egg, the WebJob await host.RunAsync();
wont shut down until the references are cleared, but some of these references are being held up by the host...(via Configuration/KeyVault integration, App Insights integration, and it seems the Lazy<>
references in the HttpClient Factory)!
I was suspecting that AzureServiceTokenProvider
as part of Microsoft.Azure.Services.AppAuthentication
was involved, but seem the .Net Core implementation is not leveraging the 'HttpClientFactory' internally.
So that last entry points out that there are three HttpClient
instances held in statics including 1 in a Lazy<HttpClient
>.
It's a chicken-and-egg, the WebJob
await host.RunAsync();
wont shut down until the references are cleared, but some of these references are being held up by the host...(via Configuration/KeyVault integration, App Insights integration, and it seems theLazy<>
references in the HttpClient Factory)!
@mathewc @fabiocav - is there any more information you can provide? I don't understand what WebJobs specific mechanism is causing problems here.
TBH: On reflection there seems to be a conflict with using a TimerTrigger job with the expectation that this can be deployed as "triggered" not "continuous", as The TimeTrigger Job does not end...it just idles for the next trigger time.
Whist between "continuous" time triggered runs, and Idle, the HttpClientFactory is trying to clean up any HttpClient/Handler. It can't as the WebJob runtime it holding onto at least two of them (in my case, KeyVault integration and App Insights). So at least setting the SetHandlerLifetime(Timeout.InfiniteTimeSpan)
eliminates the logging noise.
There seem to be (undocumented) design decisions/direction to move away from "triggered" WebJob deployments (in general):
settings.job
.That being said, in hindsight it seem that it may be better to move the functionality to a IHostedService as a background task in the main Web/API App implementation and eliminate to deployment complexity.
At the end of the day the problems here all tie back to any real WebJob 3.0 documentation and there is likely no issue here...
One hopes that Microsoft do not intend to release future versions with out the requisite documentation, to avoid churn as we learn the hard way.
@acds apologies for the delayed response. Somehow, I managed to miss notifications on this issue.
You're correct on what you've outlined above.
Adding @mathewc as he can also provide more context on the WebJobs infrastructure, if needed.
I am terribly sorry for barging in on a distantly connected issue, but I was trying to find a follow up on aspnet/HttpClientFactory#165, which was created from dotnet/aspnetcore#3470, which in turn led to this issue, and they are all closed for comments, and I really don't think it merits an issue in itself.
Now, I fully understand what @davidfowl says:
The logs seem indicate that things are never going out of scope and hence the supposed leak. What's happening is the timer that is used to expire the handler has a reference to a callback which seemingly keeps the HttpMessageHandler alive.
When a GC happens things are cleaned up as expected. Until then, the background scanner will keep looping until that happens.
But I still am worried: if for some reason GC doesn't clean things up for some time, I might end up having quite a few of handlers linked there being undisposed. Wouldn't that pose an issue with socket exhaustion, which was one of the original reasons for reusing the handlers? Especially that I am on .net 4.8, not Core, so I don't have access to SocketsHttpHandler
.
Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.
Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.
See discussion here: https://github.com/aspnet/AspNetCore/issues/3470#issuecomment-455001937
@acds reported that there was an ongoing message handler cleanup cycle with only a few handlers. This is somewhat normal since we can only collect a handler if it's eligible to be collected (because the
HttpClient
instances have been GC'ed.However in this case the app didn't have any traffic going to it and this was causing issues because:
We should investigate what we can do for cases like this where HttpClient factory is being used in a short-running scenario.