Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.63k stars 2.84k forks source link

Error amqp:link:detach-forced causes send_messages to timeout for 7s. #38627

Open cabal-daniel opened 1 day ago

cabal-daniel commented 1 day ago

Describe the bug We are experiencing a bug where our sender.send_messages are taking longer than 7s to send a message every 10 minutes.

Image

After careful investigation we determined that this line in servicebus.py raises an exception

    @trace_function
    def _send(
        self,
        message: Union[ServiceBusMessage, ServiceBusMessageBatch],
        timeout: Optional[float] = None,
        last_exception: Optional[Exception] = None,  # pylint: disable=unused-argument
    ) -> None:
        self._amqp_transport.send_messages(self, message, _LOGGER, timeout=timeout, last_exception=last_exception)

the exception being

Error condition: amqp:link:detach-forced
Error Description: The link 'G0:38070458:85065d4d-9298-4e43-b9cc-6aba5173f804' is force detached. Code: publisher(link88573). Details: AmqpMessagePublisher.IdleTimerExpired: Idle timeout: 00:10:00.

                                               File /venv/lib/python3.12/site-packages/azure/servicebus/_transport/_pyamqp_transport.py, line 498, in send_messages
                                                             File /venv/lib/python3.12/site-packages/azure/servicebus/_pyamqp/client.py, line 701, in send_message
                                                             File /venv/lib/python3.12/site-packages/azure/servicebus/_pyamqp/client.py, line 294, in _do_retryable_operation
                                                             File /venv/lib/python3.12/site-packages/azure/servicebus/_pyamqp/client.py, line 273, in _do_retryable_operation
                                                             File /venv/lib/python3.12/site-packages/azure/servicebus/_pyamqp/client.py, line 671, in _send_message_impl
                                                             File /venv/lib/python3.12/site-packages/azure/servicebus/_pyamqp/client.py, line 406, in client_ready
                                                             File /venv/lib/python3.12/site-packages/azure/servicebus/_pyamqp/client.py, line 603, in _client_ready
                                                               File /venv/lib/python3.12/site-packages/azure/servicebus/_pyamqp/link.py, line 112, in get_state
                                                             File /venv/lib/python3.12/site-packages/azure/servicebus/_pyamqp/client.py, line 273, in _do_retryable_operation
                                                             File /venv/lib/python3.12/site-packages/azure/servicebus/_pyamqp/client.py, line 671, in _send_message_impl
                                                             File /venv/lib/python3.12/site-packages/azure/servicebus/_pyamqp/client.py, line 406, in client_ready
                                                             File /venv/lib/python3.12/site-packages/azure/servicebus/_pyamqp/client.py, line 603, in _client_ready
                                                               File /venv/lib/python3.12/site-packages/azure/servicebus/_pyamqp/link.py, line 112, in get_state
                                                             File /venv/lib/python3.12/site-packages/azure/servicebus/_pyamqp/client.py, line 273, in _do_retryable_operation
                                                             File /venv/lib/python3.12/site-packages/azure/servicebus/_pyamqp/client.py, line 671, in _send_message_impl
                                                             File /venv/lib/python3.12/site-packages/azure/servicebus/_pyamqp/client.py, line 406, in client_ready
                                                             File /venv/lib/python3.12/site-packages/azure/servicebus/_pyamqp/client.py, line 603, in _client_ready
                                                               File /venv/lib/python3.12/site-packages/azure/servicebus/_pyamqp/link.py, line 112, in get_state
                                                             File /venv/lib/python3.12/site-packages/azure/servicebus/_pyamqp/client.py, line 273, in _do_retryable_operation
                                                             File /venv/lib/python3.12/site-packages/azure/servicebus/_pyamqp/client.py, line 671, in _send_message_impl
                                                             File /venv/lib/python3.12/site-packages/azure/servicebus/_pyamqp/client.py, line 406, in client_ready
                                                             File /venv/lib/python3.12/site-packages/azure/servicebus/_pyamqp/client.py, line 603, in _client_ready
                                                               File /venv/lib/python3.12/site-packages/azure/servicebus/_pyamqp/link.py, line 112, in get_state

Which is https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-amqp-troubleshoot which does not provide details on how to mitigate this issue.

Our question is can we either prevent this exception from being raised? If not, can we prevent the 7s delays from occurring?

To Reproduce Steps to reproduce the behavior:

  1. call sender.send_messages with a credential, wait 20 minutes, then do it again.

Expected behavior The second call should take about the same time as the first call being less than 1s to execute.

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

kashifkhan commented 1 day ago

Thank you for the feedback @cabal-daniel , we will investigate this and get back to you asap.

kashifkhan commented 1 day ago

Hi @cabal-daniel ,

The reason why the link detaches is because of the 10 min wait. As there is no activity over the connection, the link gets detached and enters our retry loop. In order to keep things happy there is a little bit of exponential back off happening.

You see the following error when the AMQP connection and link are active but no calls (for example, send or receive) are made using the link for 10 minutes. So, the link is closed. The connection is still open.

To get around this you can reduce the number of retries or have something being sent over the sender.

github-actions[bot] commented 1 day ago

Hi @cabal-daniel. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

jonathan-fileread commented 12 hours ago

Hey @kashifkhan - thanks for this. would you recommend reverting to uamqp_protocol=True? Reducing # of retries reduces resiliency and having something being sent over the sender feels like a anti-pattern (i.e. cronjob or some wake message before send_messages() is triggered in actual workflow)

cabal-daniel commented 12 hours ago

Is there a way to check the connection has been detached and therefore, for a re-open of the connection again? I see that there's a thread that opens up whenever the detach happens. Can we keep a thread to always keep it open?

kashifkhan commented 12 hours ago

@jonathan-fileread I would suggest going to that as a super super temporary workaround ... uAMQP is only supported for legacy users and it will not be getting latest features/bug fixes.

@cabal-daniel in this case the connection is not detached but the link. As the link has not transferred any data the service detaches, while the connection is still open. Have you disabled keep alive ?

We might have to investigate it a bit further as AMQP can be quite picky at times as to how things happen and the order, along with managing link credits etc

cabal-daniel commented 10 hours ago

We did not disable keep alive. Is it turned off by default? I cannot find documentation on that. How do we make sure keep-alive is on?