Azure / azure-sdk-for-net

This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.
MIT License
5.46k stars 4.8k forks source link

ServiceBusClient.AcceptNextSessionAsync logs ServiceBusException as "exception" (not as verbose) after ServiceTimeout #46096

Closed ulluoink closed 2 weeks ago

ulluoink commented 1 month ago
          @johankson the fix was to reduce the severity of this log message. Are you seeing this logged as Verbose?

Originally posted by @JoshLove-msft in https://github.com/Azure/azure-sdk-for-net/issues/27976#issuecomment-1258513410

in our case this is still logged as exception. how can we provide the logTimeoutResourceAsVerbose parameter?

whenever we call ServiceBusClient.AcceptNextSessionAsync the expected ServiceBusException with reason ServiceBusFailureReason.ServiceTimeout is logged as an exception to console and application insights.

how can we change this behaviour?

jsquire commented 1 month ago

Hi @ulluoink. Thanks for reaching out and we regret that you're experiencing difficulties. There's not enough context available in the issue to assist at the moment. Please help us understand:

github-actions[bot] commented 1 month ago

Hi @ulluoink. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

ulluoink commented 1 month ago

we of course expect a ServiceBusException with reason ServiceBusFailureReason.ServiceTimeout if no messages/sessions are available.

but this exception is logged as a trace with severity level "error" which we do not want because this is an expected exception.

we configured the logLevel as "Error" so traces should not be logged.

ulluoink commented 1 month ago

The problem seems to be in ServiceBusSessionReceiver.cs in CreateSessionReceiverAsync.

https://github.com/Azure/azure-sdk-for-net/blob/c8ebad5166df3b460b5297fb0cb3193c437f2632/sdk/servicebus/Azure.Messaging.ServiceBus/src/Receiver/ServiceBusSessionReceiver.cs#L63

when await receiver.OpenLinkAsync(cancellationToken).ConfigureAwait(false); throws the ServiceBusException because of the ServiceTimeout this is not catched because the "isProcessor" parameter of CreateSessionReceiverAsync is false.

so only the common Exception is catched catch (Exception ex) { await receiver.CloseAsync(CancellationToken.None).ConfigureAwait(false); receiver.Logger.ClientCreateException(typeof(ServiceBusSessionReceiver), receiver.FullyQualifiedNamespace, entityPath, ex); throw; }

and the Logger.ClientCreateException is called which is bad and simply wrong because the client has been succesfully created just before the try catch block

instead this line fails await receiver.OpenLinkAsync(cancellationToken).ConfigureAwait(false);

with the expected ServiceBusException and the reason ServiceBusFailureReason.ServiceTimeout.

ulluoink commented 1 month ago

@jsquire did you have the opportunity to look into this?

jsquire commented 1 month ago

@ulluoink : The client logic is correct in the block that you're highlighting. For a session receiver to be successfully created, the link must be established. Otherwise, the client is not viable. Because you're using the client directly rather than via the trigger or a processor, the error is correct as is the logic path.

That said, one can make the case that a known timeout attempting to create a session receiver should be represented differently in the logs and appear as a known "no available session" at an informational or verbose level rather than a client creation error log. Before doing so, there's still some additional investigation needed to understand why it was designed this way, which I'm still following-up on.

In the meantime, can you help me understand more about your application? The timer trigger that you're using is not an Azure SDK client-based trigger. As such, it does not import the Azure SDK infrastructure or start the process that forwards our ETW logs to the Functions ILogger instance. There's missing context to explain why logs are being captured, but it would imply that either you're using another WebJobs extension package for an Azure SDK client or you're using the Azure SDK extensions package and starting the forwarder directly.

Can you share more details and, ideally, the packages section of your .csproj file?

github-actions[bot] commented 1 month ago

Hi @ulluoink. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

ulluoink commented 1 month ago

Hi @jsquire ,

concerning the codeblock I shared:

I think it is just not correct that you handle the ServiceBusException in a different way here - you already catch the ServiceBusException with reason ServiceBusFailureReason.ServiceTimeout but only if isProcessor is true. In our case when using AcceptNextSessionAsync the isProcessor is false. And then only the common Exception is caught.


Here are the requested details about our FunctionApp and the PackageReferences:

We use an Azure FunctionApp which contains a timer triggered endpoint. The FunctionApp is dotnet-isolated and .net 8.

The function then uses the Azure.Messaging.ServiceBus package to create a ServiceBusClient and then receive the ServiceBus sessions of a topic/subscription.

We then receive the messages of each session. This is all done because we need all messages within a session to work with them together.

The problem occurs if no session is present (we are already catching the ServiceBusException with reason ServiceBusFailureReason.ServiceTimeout).

The problem now is that the ServiceBus sdk logs a Trace in ApplicationInsights which mainly says: "An exception occurred while creating a ServiceBusSessionReceiver (Namespace ..." "Error Message: 'Azure.Messaging.ServiceBus.ServiceBusException: The operation did not complete within the allotted timeout of 00:00:02.9720000. The time allotted to this operation may have been a portion of a longer timeout..."

The EventName of the Trace is "ClientCreateExceptionCore", the LogLevel is "Error".

As the AzureFunction is triggered every 10 seconds we got tons of those Trace entrioes in our AppInsights.

As we use several .csproj files (nugerts, dlls etc) I tried to come with a simplified version and narrowed it to these PackageReferences:

  <ItemGroup>
    <FrameworkReference Include="Microsoft.AspNetCore.App" />
    <PackageReference Include="Azure.Messaging.ServiceBus" Version="7.18.1" />
    <PackageReference Include="Bof.EventFrost" Version="13.0.4" />
    <PackageReference Include="Bof.EventFrost.Azure" Version="13.0.4" />
    <PackageReference Include="Microsoft.Azure.Functions.Worker" Version="1.23.0" />
    <PackageReference Include="Microsoft.Azure.Functions.Worker.Extensions.Http" Version="3.2.0" />
    <PackageReference Include="Microsoft.Azure.Functions.Worker.Extensions.Http.AspNetCore" Version="1.3.2" />
    <PackageReference Include="Microsoft.Azure.Functions.Worker.Extensions.Timer" Version="4.3.1" />
    <PackageReference Include="Microsoft.Azure.Functions.Worker.Sdk" Version="1.17.4" />
    <PackageReference Include="Microsoft.ApplicationInsights.WorkerService" Version="2.22.0" />
    <PackageReference Include="Microsoft.Azure.Functions.Worker.ApplicationInsights" Version="1.4.0" />
    <PackageReference Include="Microsoft.Extensions.Azure" Version="1.7.5" />
    <PackageReference Include="Microsoft.AspNetCore.Mvc.NewtonsoftJson" Version="8.0.8" />
    <PackageReference Include="Microsoft.Azure.Core.NewtonsoftJson" Version="2.0.0" />
    <PackageReference Include="Microsoft.Azure.Functions.Worker.Core" Version="1.19.0" />
  </ItemGroup>
jsquire commented 1 month ago

Thanks, @ulluoink. I spent some time running down history here and can't find any rationale or justification for why we're treating the processor scenario differently. After some internal discussions, we're in agreement that the logs are inaccurate in this scenario. We're going to add an additional timeout clause with a non-processor focused message. We'll do our best to get this included in the October releases, assuming that we don't encounter any unexpected issues during testing.

I'm still not 100% sure which packages is auto-starting the forwarder, as that will normally only happen when an Azure SDK Functions extension package is providing a trigger. However, that's orthogonal and more a curiosity than impactful here, other than having the option to turn off all Azure SDK log forwarding in your application as a short-term mitigation.

jsquire commented 1 month ago

fixed by #46388

github-actions[bot] commented 1 month ago

Hi @ulluoink. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

ulluoink commented 1 month ago

Thank you @jsquire for your commitment and the quick solution

jsquire commented 4 weeks ago

This change was included in the 7.18.2 release.

github-actions[bot] commented 2 weeks ago

Hi @ulluoink, since you haven’t asked that we /unresolve the issue, we’ll close this out. If you believe further discussion is needed, please add a comment /unresolve to reopen the issue.