Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.63k stars 2.84k forks source link

Intermittent error connecting to service bus from Azure App Service: ValueError: Unable to open CBS link; AMQPConnectionError: Unable to open authentication session on connection #32941

Closed sbfrancies-onyx closed 1 year ago

sbfrancies-onyx commented 1 year ago

Describe the bug For a period of around two days we received the following exception level error in our Application Insight logs, originating in the package. The application code/infrastructure and python/package version did not change during the time the issue started, was occurring or stopped. Microsoft Azure support have been unable to help so far and suggested the issue could be in the package.

The first exception occurred at 2023-10-26T00:41:47.481099Z It happened approximately 1300 times The final exception occurred at 2023-10-27T13:29:07.315539Z

Blotter Trades Topic Receiver: Exception in receiver thread 1. Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/uamqp/authentication/cbs_auth.py", line 66, in create_authenticator self._cbs_auth = c_uamqp.CBSTokenAuth( File "./src/cbs.pyx", line 75, in uamqp.c_uamqp.CBSTokenAuth.cinit ValueError: Unable to open CBS link.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/azure/servicebus/_base_handler.py", line 402, in _do_retryable_operation return operation(kwargs) File "/opt/venv/lib/python3.10/site-packages/azure/servicebus/_servicebus_receiver.py", line 378, in _open self._handler.open(connection=self._connection) File "/opt/venv/lib/python3.10/site-packages/uamqp/client.py", line 260, in open self._build_session() File "/opt/venv/lib/python3.10/site-packages/uamqp/client.py", line 209, in _build_session self._connection._cbs = self._auth.create_authenticator( File "/opt/venv/lib/python3.10/site-packages/uamqp/authentication/cbs_auth.py", line 446, in create_authenticator return super(JWTTokenAuth, self).create_authenticator(connection, debug, kwargs) File "/opt/venv/lib/python3.10/site-packages/uamqp/authentication/cbs_auth.py", line 79, in create_authenticator raise errors.AMQPConnectionError( uamqp.errors.AMQPConnectionError: Unable to open authentication session on connection b'SBReceiver-a341f222-41f8-43a4-9c1a-ad5871ee3658'. Please confirm target hostname exists: b'***.servicebus.windows.net'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/app/./service_bus/topic_receiver_thread_manager.py", line 61, in continuously_receive_messages topic_receiver.setup_receiver_to_receive_messages() File "/app/./service_bus/topic_receiver.py", line 37, in setup_receiver_to_receive_messages with get_service_bus_receiver_for_next_session(self.service_bus_client, self.topic, self.subscription) as receiver: File "/opt/venv/lib/python3.10/site-packages/azure/servicebus/_base_handler.py", line 327, in enter self._open_with_retry() File "/opt/venv/lib/python3.10/site-packages/azure/servicebus/_base_handler.py", line 533, in _open_with_retry return self._do_retryable_operation(self._open) File "/opt/venv/lib/python3.10/site-packages/azure/servicebus/_base_handler.py", line 416, in _do_retryable_operation raise last_exception azure.servicebus.exceptions.ServiceBusConnectionError: Unable to open authentication session on connection b'SBReceiver-a341f222-41f8-43a4-9c1a-ad5871ee3658'. Please confirm target hostname exists: b'***.servicebus.windows.net'

To Reproduce Steps to reproduce the behaviour:

  1. Cannot be reproduced at this time

Expected behaviour No exceptions to be thrown when connecting to service bus

Additional context We run a python app in an Azure App Service with multiple instances. It spins up multiple workers which subscribe to a service bus topic. Authentication uses Azure managed Identity which did not change over the period that the exceptions occurred.

The app is build to be resilient and retries on failure. We know this problem was not happening on every connection in every worker or there would have been a continual flow of exceptions every few milliseconds rather than one every minute or two.

Due to the retry mechanisms we built and the seemingly intermittent nature of the issue our app continued to function despite the 1000+ unexplained failures.

kashifkhan commented 1 year ago

Thank you for the details @sbfrancies-onyx . Typically that error message means that authentication is not happening properly either due to some changes in the identity or network. Are you seeing this problem in any other environments or just this one?

I would also encourage you to update servicebus to the latest version as it is now using the new python based amqp library that is more performant and stable.

sbfrancies-onyx commented 1 year ago

Thanks for your response @kashifkhan - we were only seeing this issue in one environment and it hasn't reoccurred since. Upgrading sounds like a sensible idea even if we can't tell if it would have resolved this issue.

kashifkhan commented 1 year ago

@sbfrancies-onyx so is the error no longer happening in the environment ? Im wondering if there was some transient error that occured or somewhere in your environment things were being re-created or taken down. Given that this was happening only in one environment is a little encouraging because at least we can compare and see if there are any differences in settings etc.

As for upgrading we will be releasing a new version of the servicebus SDK next week, so it would be best to be on that version.

sbfrancies-onyx commented 1 year ago

@kashifkhan - the error started showing up at 2023-10-26 00:41:47 and stopped appearing at 2023-10-27 13:29:07 (UTC). We didn't make any changes before, during or after that time period, except to try restarting the app service (which made no difference). Thanks again for your assistance.

kashifkhan commented 1 year ago

@sbfrancies-onyx are there any debug logs that can be shared from that time frame ( if there is sensitive information I can share my email address), that might help in providing some more pointers around what went wrong? Simply going from that stack trace makes things a bit harder to pin down :)

sbfrancies-onyx commented 1 year ago

Hi @kashifkhan - sorry I don't think we have anything else from that time period. Apologies that I can't be more helpful.

kashifkhan commented 1 year ago

@sbfrancies-onyx no worries about that :) If the problem does happen again, please feel free to open an issue and share the logs etc. and we can dig deeper into it. Im hoping it was just a transient issue and that upgrading next week will further reduce the chances of it appearing

github-actions[bot] commented 1 year ago

Hi @sbfrancies-onyx. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

github-actions[bot] commented 1 year ago

Hi @sbfrancies-onyx, since you haven’t asked that we /unresolve the issue, we’ll close this out. If you believe further discussion is needed, please add a comment /unresolve to reopen the issue.

jdudleyie commented 1 year ago

@kashifkhan just wondering if that release is delayed? Thanks.

kashifkhan commented 1 year ago

@jdudleyie we have released a new version of the library On Nov 13 on pypi.