Azure / azure-functions-servicebus-extension

Service Bus extension for Azure Functions
MIT License
65 stars 35 forks source link

[BUG] Something is causing Port Exhaustion on an hourly basis (on Linux ASP) #176

Open jsquire opened 2 years ago

jsquire commented 2 years ago

Issue Transfer

This issue has been transferred from the Azure SDK for .NET repository, #25833.

Please be aware that @jenspettersson is the author of the original issue and include them for any questions or replies.

Details

Library name and version

Microsoft.Azure.WebJobs.Extensions.ServiceBus 4.3.0

Describe the bug

First, we're not sure this is caused by the ServiceBus package entered above in "Library name and version", but we're suspecting it. It might be something else...

We're seeing spikes of "New SNAT connections established" every hour on Linux App Service Plan P1v2: 1. We first saw it on our service plan where our real system is running and immediately began looking for things in our code that might cause this (no reuse of HttpClient etc). We found nothing.

Then I created a brand new Azure Function App, hosted on a brand new App Service Plan P1v2 (nothing else on that ASP). The function app has:

  1. One ServiceBusTrigger that consumes messages from a queue and write to the standard log output
  2. One TimerTrigger that produces a message on the queue every five minutes
  3. A HttpTrigger that produces a message on the queue when invoked

I deployed this yesterday evening and left it alone over night: image

The reports shows that new SNAT Connections has been established every hour, and sometimes there are so many new connections that they cause port exhaustions and results in failing connections.

image On this image you can see a huge spike at 01:20 and then new, albeit a bit smaller, spikes at 02:20, 03:20, 04:20 and 05:20.

This is the exact same behavior we're seeing on our real system. Even with little to no load at all (we're not yet in production) we keep seeing the errors for Port Exhaustion in the diagnostics tool. We're not quite sure if this is actually impacting the performance of our system but as we've been fixing port exhaustion issues in other systems last year (these were caused by HttpClient being used in an incorrect way) we're always on our toes when we see these kind of error reports.

Expected behavior

Not seeing any Port Exhaustion errors in an app with no load at all and almost no code at all.

Actual behavior

Port Exhaustions on an almost hourly basis on an Azure Function App doing almost nothing at all.

Reproduction Steps

Create an Azure Service Bus Namespace

Create a queue called snat on that namespace

Create a new App Service Plan: Linux P1v2

Create a new Azure Function App

Create the following Functions:

[FunctionName("QueueMessageProducer")]
public static async Task RunAsync(
    [TimerTrigger("0 */5 * * * *")] TimerInfo myTimer,
    [ServiceBus("snat", Connection = "AzureServiceBus")] IAsyncCollector<string> messages,
    ILogger log)
{
    var now = DateTime.UtcNow;
    log.LogInformation($"C# Timer trigger function executed at: {now}");

    await messages.AddAsync($"Message from QueueMessageProducer {now}");
}
[FunctionName("QueueTrigger")]
public static Task RunAsync([ServiceBusTrigger("snat", Connection = "AzureServiceBus")] string myQueueItem, ILogger log)
{
    log.LogInformation($"C# ServiceBus queue trigger function processed message: {myQueueItem}");

    return Task.CompletedTask;
}

Deploy it and let it run and then look at the reports in Troubleshoot under Diagnose and solve problems. It's quite a long delay on the SNAT reports so you might have to let it run for a while...

Environment

Hosting: Azure Function App (running on App Service Plan Linux P1v2) Azure Function runtime: dotnet Azure Function version: v3 Target Framework: netcoreapp3.1

The following package references were used:

<PackageReference Include="Microsoft.Azure.WebJobs" Version="3.0.29" />
<PackageReference Include="Microsoft.Azure.WebJobs.Extensions.Http" Version="3.0.12" />
<PackageReference Include="Microsoft.Azure.WebJobs.Extensions.ServiceBus" Version="4.3.0" />
<PackageReference Include="Microsoft.Azure.WebJobs.Extensions.Storage" Version="3.0.4" />
<PackageReference Include="Microsoft.NET.Sdk.Functions" Version="3.0.13" />
<PackageReference Include="Microsoft.Azure.Functions.Extensions" Version="1.1.0" />
hossam-nasr commented 1 year ago

@jsquire I've tried to repro this issue, following the exact steps you provided, and I'm not seeing any SNAT port exhaustion warnings, or any SNAT port spikes at all:

image

I've had this running for about a week now. Is this still an issue that you face? I see in the original issue here: https://github.com/Azure/azure-sdk-for-net/issues/25833, that it was determined the issue was most likely unrelated to the ServiceBus extension, and recommendation was to open a support ticket. I wonder if the underlying issue was ultimately addressed and we can go ahead and close this issue?

jsquire commented 1 year ago

@hossam-nasr: My role in this was only to triage from the original repro. The person that you'll want to ask is @jenspettersson.

hossam-nasr commented 1 year ago

Hi @jenspettersson and @JoshLove-msft (both were included in the discussion in the original issue here: https://github.com/Azure/azure-sdk-for-net/issues/25833), same question as above: was the underlying issue ultimately resolved? Should we also close this issue or is this still a problem that requires investigation?