Azure / azure-sdk-for-js

This repository is for active development of the Azure SDK for JavaScript (NodeJS & Browser). For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/javascript/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-js.
MIT License
2.08k stars 1.2k forks source link

Service-bus suddenly reports a lot of warnings #30910

Open marcus13371337 opened 2 months ago

marcus13371337 commented 2 months ago

Describe the bug We have a service which produces messages. Suddenly a lot of warnings appear in the log output of the service. Here is an example:

azure:service-bus:sender:warning [connection-1|sender:topic-u-u-i-d] 'sender_error' event occurred on the sender 'topic-u-u-i-d with address 'topic'. The associated error : c {
  value: [
    Typed { type: [TypeDesc], value: 'amqp:link:detach-forced' },
    Typed {
      type: [TypeDesc],
      value: "The link 'G4:99362848:topic-u-u-i-d' is force detached. Code: publisher(link15395778). Details: AmqpMessagePublisher.IdleTimerExpired: Idle timeout: 00:10:00."
    },
    Typed { type: [TypeDesc], value: null }
  ]
}

To Reproduce Steps to reproduce the behavior:

  1. Send a message
  2. Wait 10 minutes
  3. A warning is shown in the logs

Expected behavior The client can handle closed connections and recover from them without logging a warning

github-actions[bot] commented 2 months ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @EldertGrootenboer.

jeremymeng commented 2 months ago

@marcus13371337 it is expected that the SDK log a warning when it encounter an error but can recover from it. In your case, did your service continue to work or not?

marcus13371337 commented 2 months ago

@jeremymeng Yes, the service is still running, and no events seem to have been dropped. Seeing that warning made me as a developer worried that we weren't able to emit all our events. If it's expected to log whenever a warning is received, I guess there is an issue in the SDK with abandoning idle connections before the "IdleTimeout" kicks in?

marcus13371337 commented 2 months ago

Another note, we are also seeing the same warning (there are a lot of them) in another service, so shouldn't be related to any of our own changes

jeremymeng commented 2 months ago

@marcus13371337 the behavior hasn't been changed in the SDK for quite a while. We receive this particular error from the service when the link has not been used for 10 minutes https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-amqp-troubleshoot#link-is-closed

The SDK should re-create the link when messages are sent later.

You mentioned that you notice more warnings recently. Are those warnings logged once or maybe twice each time the link is detached, or more than expected?

marcus13371337 commented 2 months ago

@jeremymeng That's basically the only log we see in our logs and there are a LOT of them, it seems as soon as we haven't sent a message on a topic (we have >100 of them) in 10 minutes a warning will be logged. I mean if it's expected that a link is closed after Idle and the SDK can create a new one once needed, why log a warning for it?

marcus13371337 commented 2 months ago

Could it be that the service-bus-service has changed? Maybe in a way that now an IdleTimeout results in an error? Or perhaps the default IdleTimeout has been changed to 10 minutes?

jeremymeng commented 2 months ago

I mean if it's expected that a link is closed after Idle and the SDK can create a new one once needed, why log a warning for it?

@marcus13371337 That's a fair point! We could possibly log an info entry instead when the error is recoverable. Currently it logs warnings for all errors https://github.com/Azure/azure-sdk-for-js/blob/62597f19f9508617fd203dd78be3669960474a56/sdk/servicebus/service-bus/src/core/messageSender.ts#L78

Could it be that the service-bus-service has changed? Maybe in a way that now an IdleTimeout results in an error? Or perhaps the default IdleTimeout has been changed to 10 minutes?

As far as I know, there hasn't been any changes in this area for quite a while, at least not in recent two or three years.