Closed sbfrancies-onyx closed 1 year ago
Thank you for the details @sbfrancies-onyx . Typically that error message means that authentication is not happening properly either due to some changes in the identity or network. Are you seeing this problem in any other environments or just this one?
I would also encourage you to update servicebus to the latest version as it is now using the new python based amqp library that is more performant and stable.
Thanks for your response @kashifkhan - we were only seeing this issue in one environment and it hasn't reoccurred since. Upgrading sounds like a sensible idea even if we can't tell if it would have resolved this issue.
@sbfrancies-onyx so is the error no longer happening in the environment ? Im wondering if there was some transient error that occured or somewhere in your environment things were being re-created or taken down. Given that this was happening only in one environment is a little encouraging because at least we can compare and see if there are any differences in settings etc.
As for upgrading we will be releasing a new version of the servicebus SDK next week, so it would be best to be on that version.
@kashifkhan - the error started showing up at 2023-10-26 00:41:47 and stopped appearing at 2023-10-27 13:29:07 (UTC). We didn't make any changes before, during or after that time period, except to try restarting the app service (which made no difference). Thanks again for your assistance.
@sbfrancies-onyx are there any debug logs that can be shared from that time frame ( if there is sensitive information I can share my email address), that might help in providing some more pointers around what went wrong? Simply going from that stack trace makes things a bit harder to pin down :)
Hi @kashifkhan - sorry I don't think we have anything else from that time period. Apologies that I can't be more helpful.
@sbfrancies-onyx no worries about that :) If the problem does happen again, please feel free to open an issue and share the logs etc. and we can dig deeper into it. Im hoping it was just a transient issue and that upgrading next week will further reduce the chances of it appearing
Hi @sbfrancies-onyx. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.
Hi @sbfrancies-onyx, since you haven’t asked that we /unresolve
the issue, we’ll close this out. If you believe further discussion is needed, please add a comment /unresolve
to reopen the issue.
@kashifkhan just wondering if that release is delayed? Thanks.
@jdudleyie we have released a new version of the library On Nov 13 on pypi.
Describe the bug For a period of around two days we received the following exception level error in our Application Insight logs, originating in the package. The application code/infrastructure and python/package version did not change during the time the issue started, was occurring or stopped. Microsoft Azure support have been unable to help so far and suggested the issue could be in the package.
The first exception occurred at 2023-10-26T00:41:47.481099Z It happened approximately 1300 times The final exception occurred at 2023-10-27T13:29:07.315539Z
To Reproduce Steps to reproduce the behaviour:
Expected behaviour No exceptions to be thrown when connecting to service bus
Additional context We run a python app in an Azure App Service with multiple instances. It spins up multiple workers which subscribe to a service bus topic. Authentication uses Azure managed Identity which did not change over the period that the exceptions occurred.
The app is build to be resilient and retries on failure. We know this problem was not happening on every connection in every worker or there would have been a continual flow of exceptions every few milliseconds rather than one every minute or two.
Due to the retry mechanisms we built and the seemingly intermittent nature of the issue our app continued to function despite the 1000+ unexplained failures.