Closed anthonychu closed 5 years ago
@brettsam I seem to recall you investigated a similar issue regarding service bus connection management. Does this look related?
@anthonychu -- Are you using a Consumption plan? With a Consumption plan, you are limited to 250 connections (https://github.com/projectkudu/kudu/wiki/Azure-Web-App-sandbox#per-sandbox-per-appper-site-numerical-limits). When that number is hit, you'll start seeing this error.
In the other cases I've come across, this was due to the function creating a new Service Bus MessagingFactory
, using it, and never closing it. Every MessagingFactory
creates a single connection and won't release it until Close()
is called. Using the IBinder
approach to programmatically create bindings hits this also: https://github.com/Azure/azure-webjobs-sdk/issues/881.
Do you happen to be doing anything like this with your ServiceBus interactions? Would you mind sharing your code (even stripped down if you want)?
Using a static HttpClient should be alright but those 10 connections firing with each function invocation is another place to look. HttpClient has a way to throttle web calls -- where it will only allow X number of connections to a specific host before it starts queuing the requests. But I believe in Functions this number X is very high because it runs in the context of a Web app. If the ServiceBus MessagingFactory
isn't the culprit, we'll take a look here.
@brettsam I was using a consumption plan. The 250 concurrent connections appears to jive with what I was seeing. If I went above 16 maxConcurrentCalls
in host.json (like 32 or 64), I started getting those errors. Each triggered call generated 9-10 HTTP calls. So the math roughly adds up.
I'm not using MessagingFactory
.
Thanks for the explanation. Looks like this is expected behavior.
If you issue your HTTP calls sequentially, does the problem go away (while your function obviously takes longer)?
It may be expected, but I think there's some knobs we can expose to make this better if it is the HTTP calls that are hitting the limit. For example, if you were able to set ServicePointManager.DefaultConnectionLimit to something like 200, you'd be able to make as many calls as you could as quickly as possible -- some may be queued up, but it'd be better than trying to issue them sequentially.
If changing your HTTP calls fixes things, I'll rename this issue and we can look into a better solution.
I've updated the title as I've run into this elsewhere and verfied that it is an issue. If you, for example, loop through a bunch of tasks (thousands) and try to insert messages into a queue, you'll hit the error above. This is because the default connection limit in ASP.NET apps is int.MaxValue, while you can only have 250 connections in Consumption mode. You'll even run into issues in other areas as well if you try to create this many connections.
The best solution to this would be to set your DefaultConnectionLimit to ~200 (or even lower) and let the system queue up the requests for you. But that's not possible today.
Sounds great! Is it worth making this user configurable? Perhaps it's better to have the runtime check WEBSITE_SKU
or WEBSITE_COMPUTE_MODE
to see if it's running in in dynamic mode and set the value automatically.
Long term, it might be worth getting Kudu to expose value of this limit as an environment variable (if it's not there already).
Part of the selling point for the dynamic tier is
Don’t worry about the infrastructure and provisioning of servers, especially when your Functions call rate scales up. (From https://azure.microsoft.com/en-us/services/functions/)
I used Functions so I could take advantage of this. But because of all the SocketExceptions, I had to switch to a Standard tier, in which case, I wish I would have just stuck with WebJobs so I could use real C# and not C# script.
If it's supposed to be dynamic, there shouldn't be a limit to developers at all. Azure Functions should quietly take care of that in the background.
Request filed.
Yesterday, during the Live Build Functions segment, I asked why Functions has issues with running out of sockets in the dynamic tier. I was told that a) I'm probably not disposing my connections and b) I should be using a Factory to create my connections. Follow-up questions, a) how can I see how many connections I have open and b) how would I use a Factory to create new instances using EF Core and IBinder (if necessary).
@MisinformedDNA -- Would you mind sharing your scenario with us? I'm curious whether you're making HTTP calls to hit your limit (in which case this fix would help).
I'm clearing this milestone so we can re-evaluate this.
My suggestion is that we lower this value automatically in the Consumption plan, which would prevent the Socket errors that people are seeing. I believe we're now allowed 300 connections per sandbox -- so setting this to something like 250 would automatically protect people from exhausting their connections (but cause requests to queue up -- likely a better scenario which would allow us to scale out). Exposing it via host.json would be nice as well for those that want to increase or decrease it.
I'm using BlobTrigger to process thousands of files, which then get sent to Azure Tables and/or queues and QueueTrigger is then used to continue the process, reading and writing to SQL as needed.
I do not make any direct HTTP calls. I use IBinder to access Azure Storage. And I use Entity Framework to access Azure SQL. I always try to wrap my code in a using block to close/dispose the resources.
Still looking for feedback on this.
Would you mind letting us see your code so I can try to figure out where the connections are leaking? I wouldn't think that the usage of Azure Storage would do this as we use a single client behind-the-scenes. It may be from EF, but I'd like to see how you're using it. If you don't want to post the code here, you can send it to the email listed in my GitHub profile (with no secrets included).
@MisinformedDNA quick ping on this, see Brett's question above.
Sent.
I just ran into this today while sending documents to Azure Search to index. I think I'm going back to WebJobs.
Is there a fix for this yet? I can't even do a basic Function like set up a QueueTrigger function to write a single document to Azure Search, without running into this socket exhaustion problem.
@BowserKingKoopa -- a few questions:
Yes, I'm on a Consumption plan. That's where the socket exhaustion happens. The same code in an App Service plan, or in WebJobs, doesn't fail.
My function is a queue trigger that reads a single blob from Azure Storage (this part never fails) and writes one document to Azure Search (this is the part that fails). So it's not a lot of network activity. All the queue properties (batch size etc.) are left at their default.
I'm using the Azure Search SDK so I don't know for sure how they're using HttpClient, but I imagine this is at the root of the issue. The Azure Storage SDKs never fail, and the Azure Search SDK fails quickly. Maybe the difference is how they're using HttpClient?
The HttpClient issue is an unfortunate one. I was always taught if it's IDisposable, dispose of it. Now it seems like the best thing to do is not dispose of HttpClient. But now you have to think through what problems does never disposing of it cause. There are some problems with not disposing of it too: https://byterot.blogspot.com/2016/07/singleton-httpclient-dns.html?showComment=1508380828048
Some more questions --
I'm creating a new SearchServiceClient each invocation and disposing of it when I'm done (it is IDisposable afterall). It's probably using HttpClient behind the scenes.
Same applies I think
The SearchServiceClient class manages connections to your search service. In order to avoid opening too many connections, you should try to share a single instance of SearchServiceClient in your application if possible. Its methods are thread-safe to enable such sharing.
I hit a similar issue: I tried increasing ServicePointManager.DefaultConnectionLimit
as early as I could in my C# Azure Function, but testing locally I only see 2 HTTP requests completing concurrently. I have 1300 requests to send, so 2 at a time is very slow.
public static class Settings
{
static Settings()
{
ServicePointManager.DefaultConnectionLimit = 200;
}
@brettsam , @paulbatum the CLI version 1.0.9 started to provide the following warning when running locally:
ServicePointManager.DefaultConnectionLimit is set to the default value of 2. This can limit the connection throughput to services like Azure Storage. For more information, see https://aka.ms/webjobs-connections.
I was assuming until now that this was completely managed by Azure Functions considering that it is related to Webjob SDK (ref: https://github.com/Azure/azure-webjobs-sdk/issues/909).
I see that this issue is still open, so no way for us to tweak it, but I would expect Functions to follow this recommendation: https://aka.ms/webjobs-connections
It is managed by Functions in production as the Web host increases this limit. But the CLI doesn't when you're running locally. This has already been fixed in the CLI; we just haven't pushed out a release yet: https://github.com/Azure/azure-functions-core-tools/pull/384.
So our next release won't show this warning anymore. Sorry for the confusion.
@brettsam I don't believe any additional follow up is required here. Can this be closed?
Yeah, we've handled this in different ways and no longer will expose this setting (as it doesn't help in .NET Core anyway).
Repro steps
I have a C# function that triggers off of a Service Bus topic subscription and then makes ~10 HTTP calls to external services using HttpClient. The calls are made concurrently (starting the Tasks and then doing an
await WhenAll()
on them).Some HTTP calls would fail with this error:
I also started receiving this toast message in the function app portal:
Known workarounds
Turning down the
maxConcurrentCalls
inhost.json
to 16 appears to help (but doesn't eliminate the problem theSystem.Net.Http.HttpRequestException
entirely). More of these exceptions and the Microsoft.ServiceBus errors happen when it is turned up to 64 or higher.Related information
Perhaps this is related to how I use HttpClient. It is newed up as a static variable at the top of the function and not disposed. Is there a recommended way to use HttpClient in a function?