Azure / azure-functions-host

The host/runtime that powers Azure Functions
https://functions.azure.com
MIT License
1.94k stars 442 forks source link

Expose ServicePointManager.DefaultConnectionLimit as host setting #850

Closed anthonychu closed 5 years ago

anthonychu commented 8 years ago

Repro steps

I have a C# function that triggers off of a Service Bus topic subscription and then makes ~10 HTTP calls to external services using HttpClient. The calls are made concurrently (starting the Tasks and then doing an await WhenAll() on them).

Some HTTP calls would fail with this error:

System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.WebException: Unable to connect to the remote server ---> System.Net.Sockets.SocketException: An attempt was made to access a socket in a way forbidden by its access permissions
   at System.Net.Sockets.Socket.DoBind(EndPoint endPointSnapshot, SocketAddress socketAddress)
   at System.Net.Sockets.Socket.InternalBind(EndPoint localEP)
   at System.Net.Sockets.Socket.BeginConnectEx(EndPoint remoteEP, Boolean flowContext, AsyncCallback callback, Object state)
   at System.Net.Sockets.Socket.UnsafeBeginConnect(EndPoint remoteEP, AsyncCallback callback, Object state)
   at System.Net.ServicePoint.ConnectSocketInternal(Boolean connectFailure, Socket s4, Socket s6, Socket& socket, IPAddress& address, ConnectSocketState state, IAsyncResult asyncResult, Exception& exception)
   --- End of inner exception stack trace ---
   at System.Net.HttpWebRequest.EndGetRequestStream(IAsyncResult asyncResult, TransportContext& context)
   at System.Net.Http.HttpClientHandler.GetRequestStreamCallback(IAsyncResult ar)
   --- End of inner exception stack trace ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Submission#0.<UpdateFoo>d__5.MoveNext() in :line 39

I also started receiving this toast message in the function app portal:

Microsoft.ServiceBus: Could not connect to net.tcp://****.servicebus.windows.net:9354/. The connection attempt lasted for a time span of 00:00:00. TCP error code 10013: An attempt was made to access a socket in a way forbidden by its access permissions. . System: An attempt was made to access a socket in a way forbidden by its access permissions. Session Id: 8a9791233343447d9e0b04ba22e21412

Timestamp: 2016-10-31T05:44:21.084Z

Known workarounds

Turning down the maxConcurrentCalls in host.json to 16 appears to help (but doesn't eliminate the problem the System.Net.Http.HttpRequestException entirely). More of these exceptions and the Microsoft.ServiceBus errors happen when it is turned up to 64 or higher.

Related information

Perhaps this is related to how I use HttpClient. It is newed up as a static variable at the top of the function and not disposed. Is there a recommended way to use HttpClient in a function?

paulbatum commented 7 years ago

@brettsam I seem to recall you investigated a similar issue regarding service bus connection management. Does this look related?

brettsam commented 7 years ago

@anthonychu -- Are you using a Consumption plan? With a Consumption plan, you are limited to 250 connections (https://github.com/projectkudu/kudu/wiki/Azure-Web-App-sandbox#per-sandbox-per-appper-site-numerical-limits). When that number is hit, you'll start seeing this error.

In the other cases I've come across, this was due to the function creating a new Service Bus MessagingFactory, using it, and never closing it. Every MessagingFactory creates a single connection and won't release it until Close() is called. Using the IBinder approach to programmatically create bindings hits this also: https://github.com/Azure/azure-webjobs-sdk/issues/881.

Do you happen to be doing anything like this with your ServiceBus interactions? Would you mind sharing your code (even stripped down if you want)?

Using a static HttpClient should be alright but those 10 connections firing with each function invocation is another place to look. HttpClient has a way to throttle web calls -- where it will only allow X number of connections to a specific host before it starts queuing the requests. But I believe in Functions this number X is very high because it runs in the context of a Web app. If the ServiceBus MessagingFactory isn't the culprit, we'll take a look here.

anthonychu commented 7 years ago

@brettsam I was using a consumption plan. The 250 concurrent connections appears to jive with what I was seeing. If I went above 16 maxConcurrentCalls in host.json (like 32 or 64), I started getting those errors. Each triggered call generated 9-10 HTTP calls. So the math roughly adds up.

I'm not using MessagingFactory.

Thanks for the explanation. Looks like this is expected behavior.

brettsam commented 7 years ago

If you issue your HTTP calls sequentially, does the problem go away (while your function obviously takes longer)?

It may be expected, but I think there's some knobs we can expose to make this better if it is the HTTP calls that are hitting the limit. For example, if you were able to set ServicePointManager.DefaultConnectionLimit to something like 200, you'd be able to make as many calls as you could as quickly as possible -- some may be queued up, but it'd be better than trying to issue them sequentially.

If changing your HTTP calls fixes things, I'll rename this issue and we can look into a better solution.

brettsam commented 7 years ago

I've updated the title as I've run into this elsewhere and verfied that it is an issue. If you, for example, loop through a bunch of tasks (thousands) and try to insert messages into a queue, you'll hit the error above. This is because the default connection limit in ASP.NET apps is int.MaxValue, while you can only have 250 connections in Consumption mode. You'll even run into issues in other areas as well if you try to create this many connections.

The best solution to this would be to set your DefaultConnectionLimit to ~200 (or even lower) and let the system queue up the requests for you. But that's not possible today.

anthonychu commented 7 years ago

Sounds great! Is it worth making this user configurable? Perhaps it's better to have the runtime check WEBSITE_SKU or WEBSITE_COMPUTE_MODE to see if it's running in in dynamic mode and set the value automatically.

Long term, it might be worth getting Kudu to expose value of this limit as an environment variable (if it's not there already).

MisinformedDNA commented 7 years ago

Part of the selling point for the dynamic tier is

Don’t worry about the infrastructure and provisioning of servers, especially when your Functions call rate scales up. (From https://azure.microsoft.com/en-us/services/functions/)

I used Functions so I could take advantage of this. But because of all the SocketExceptions, I had to switch to a Standard tier, in which case, I wish I would have just stuck with WebJobs so I could use real C# and not C# script.

If it's supposed to be dynamic, there shouldn't be a limit to developers at all. Azure Functions should quietly take care of that in the background.

MisinformedDNA commented 7 years ago

Request filed.

MisinformedDNA commented 7 years ago

Yesterday, during the Live Build Functions segment, I asked why Functions has issues with running out of sockets in the dynamic tier. I was told that a) I'm probably not disposing my connections and b) I should be using a Factory to create my connections. Follow-up questions, a) how can I see how many connections I have open and b) how would I use a Factory to create new instances using EF Core and IBinder (if necessary).

brettsam commented 7 years ago

@MisinformedDNA -- Would you mind sharing your scenario with us? I'm curious whether you're making HTTP calls to hit your limit (in which case this fix would help).

brettsam commented 7 years ago

I'm clearing this milestone so we can re-evaluate this.

My suggestion is that we lower this value automatically in the Consumption plan, which would prevent the Socket errors that people are seeing. I believe we're now allowed 300 connections per sandbox -- so setting this to something like 250 would automatically protect people from exhausting their connections (but cause requests to queue up -- likely a better scenario which would allow us to scale out). Exposing it via host.json would be nice as well for those that want to increase or decrease it.

MisinformedDNA commented 7 years ago

I'm using BlobTrigger to process thousands of files, which then get sent to Azure Tables and/or queues and QueueTrigger is then used to continue the process, reading and writing to SQL as needed.

I do not make any direct HTTP calls. I use IBinder to access Azure Storage. And I use Entity Framework to access Azure SQL. I always try to wrap my code in a using block to close/dispose the resources.

MisinformedDNA commented 7 years ago

Still looking for feedback on this.

brettsam commented 7 years ago

Would you mind letting us see your code so I can try to figure out where the connections are leaking? I wouldn't think that the usage of Azure Storage would do this as we use a single client behind-the-scenes. It may be from EF, but I'd like to see how you're using it. If you don't want to post the code here, you can send it to the email listed in my GitHub profile (with no secrets included).

paulbatum commented 7 years ago

@MisinformedDNA quick ping on this, see Brett's question above.

MisinformedDNA commented 7 years ago

Sent.

BowserKingKoopa commented 7 years ago

I just ran into this today while sending documents to Azure Search to index. I think I'm going back to WebJobs.

BowserKingKoopa commented 7 years ago

Is there a fix for this yet? I can't even do a basic Function like set up a QueueTrigger function to write a single document to Azure Search, without running into this socket exhaustion problem.

brettsam commented 7 years ago

@BowserKingKoopa -- a few questions:

BowserKingKoopa commented 7 years ago

Yes, I'm on a Consumption plan. That's where the socket exhaustion happens. The same code in an App Service plan, or in WebJobs, doesn't fail.

My function is a queue trigger that reads a single blob from Azure Storage (this part never fails) and writes one document to Azure Search (this is the part that fails). So it's not a lot of network activity. All the queue properties (batch size etc.) are left at their default.

I'm using the Azure Search SDK so I don't know for sure how they're using HttpClient, but I imagine this is at the root of the issue. The Azure Storage SDKs never fail, and the Azure Search SDK fails quickly. Maybe the difference is how they're using HttpClient?

The HttpClient issue is an unfortunate one. I was always taught if it's IDisposable, dispose of it. Now it seems like the best thing to do is not dispose of HttpClient. But now you have to think through what problems does never disposing of it cause. There are some problems with not disposing of it too: https://byterot.blogspot.com/2016/07/singleton-httpclient-dns.html?showComment=1508380828048

brettsam commented 7 years ago

Some more questions --

BowserKingKoopa commented 7 years ago

I'm creating a new SearchServiceClient each invocation and disposing of it when I'm done (it is IDisposable afterall). It's probably using HttpClient behind the scenes.

dcarr42 commented 7 years ago

Same applies I think

The SearchServiceClient class manages connections to your search service. In order to avoid opening too many connections, you should try to share a single instance of SearchServiceClient in your application if possible. Its methods are thread-safe to enable such sharing.

mmaitre314 commented 6 years ago

I hit a similar issue: I tried increasing ServicePointManager.DefaultConnectionLimit as early as I could in my C# Azure Function, but testing locally I only see 2 HTTP requests completing concurrently. I have 1300 requests to send, so 2 at a time is very slow.

    public static class Settings
    {
        static Settings()
        {
            ServicePointManager.DefaultConnectionLimit = 200;
        }
SimonLuckenuik commented 6 years ago

@brettsam , @paulbatum the CLI version 1.0.9 started to provide the following warning when running locally:

ServicePointManager.DefaultConnectionLimit is set to the default value of 2. This can limit the connection throughput to services like Azure Storage. For more information, see https://aka.ms/webjobs-connections.

I was assuming until now that this was completely managed by Azure Functions considering that it is related to Webjob SDK (ref: https://github.com/Azure/azure-webjobs-sdk/issues/909).

I see that this issue is still open, so no way for us to tweak it, but I would expect Functions to follow this recommendation: https://aka.ms/webjobs-connections

brettsam commented 6 years ago

It is managed by Functions in production as the Web host increases this limit. But the CLI doesn't when you're running locally. This has already been fixed in the CLI; we just haven't pushed out a release yet: https://github.com/Azure/azure-functions-core-tools/pull/384.

So our next release won't show this warning anymore. Sorry for the confusion.

fabiocav commented 5 years ago

@brettsam I don't believe any additional follow up is required here. Can this be closed?

brettsam commented 5 years ago

Yeah, we've handled this in different ways and no longer will expose this setting (as it doesn't help in .NET Core anyway).