Azure / azure-sdk-for-java

This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.36k stars 2k forks source link

[BUG] spring-cloud-azure-starter-servicebus-jms - connection to Azure Service Bus lost for 10 minutes every 24h due to Expired Token (problem with token refresh?) #39099

Open jarekhr opened 9 months ago

jarekhr commented 9 months ago

Describe the bug Every exactly 24h our JMS client disconnects from Azure Service Bus with ExpiredToken error message. Error messages are then reported in logs for 10 minutes. After 10 minutes app reconnects (apparently somehow library manages to get refreshed access token) and the connectivity works fine for the next 24h. Problem cycle takes 24h and 10minutes, each time connection is active for 24h, then disconnection for 10min, then connection restored, and so on.

Exception or Stack Trace WARN messages reported by qpid

org.apache.qpid.jms.provider.amqp.builders.AmqpResourceBuilder WARN Open of resource:(JmsConsumerInfo: { ID:2a15d2b5-1aa4-4d4e-957e-e557e58c4d5a:1:1:79, destination = dps.prd.pub.v1.some_name }) failed: ExpiredToken: The token is expired. TrackingId:fe7d14cf-9e04-4b19-aae1-716910d66950_G0S2, SystemTracker:NoSystemTracker, Timestamp:2024-03-06T12:53:40 TrackingId:d3514c71af494f87b2e567be34538bd3_G0S2, SystemTracker:gateway5, Timestamp:2024-03-06T12:53:40 [condition = com.microsoft:auth-failed]

After 10 minutes we observed also the stack trace, logged under WARN from thread QpidJMS Connection Executor: ID:2a15d2b5-1aa4-4d4e-957e-e557e58c4d5a:1:

org.apache.qpid.jms.provider.exceptions.ProviderConnectionRemotelyClosedException: The connection was closed by container 'd3514c71af494f87b2e567be34538bd3_G0S2' because it did not have any active links in the past 300000 milliseconds. TrackingId:d3514c71af494f87b2e567be34538bd3_G0S2, SystemTracker:gateway5, Timestamp:2024-03-06T12:53:43 [condition = amqp:connection:forced]
at org.apache.qpid.jms.provider.amqp.AmqpSupport.convertToConnectionClosedException(AmqpSupport.java:128)
at org.apache.qpid.jms.provider.amqp.AmqpConnection.processRemoteClose(AmqpConnection.java:148)
at org.apache.qpid.jms.provider.amqp.AmqpProvider.processUpdates(AmqpProvider.java:968)
at org.apache.qpid.jms.provider.amqp.AmqpProvider.onData(AmqpProvider.java:878)
at org.apache.qpid.jms.transports.netty.NettyTcpTransport$NettyTcpTransportHandler.channelRead0(NettyTcpTransport.java:548)
at org.apache.qpid.jms.transports.netty.NettyTcpTransport$NettyTcpTransportHandler.channelRead0(NettyTcpTransport.java:541)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1475)
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1338)
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1387)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:509)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:407)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)

To Reproduce Steps to reproduce the behavior:

  1. listen on Azure Service Bus topic (with named subscription) using spring-cloud-azure-starter-servicebus-jms for over 24h. App is configured exactly as explained in documentation
  2. observe connection is lost exactly after 24h of listening. Seems that access token is expired
  3. observe same WARN errors logged by spring-cloud-azure-starter-servicebus-jms library, complaining about expired token
  4. observe connection is restored after 10 mins of printing errors (prev. point)
  5. Same will happen after next 24h (token gets expired again)

Code Snippet Nothing, connection related - listening on topic as per documentation.

Expected behavior I expect connection to Azure Service Bus to be operational at all times (no outages for 10 minutes every 24h). Access tokens used underneath to implement the passwordless connection should be refreshed automatically and ahead of token expiration - to ensure there's no disruption on the client app side.

Screenshots If applicable, add screenshots to help explain your problem.

Setup (please complete the following information):

Additional context We're using spring-cloud-azure-starter-servicebus-jms to listen on Azure Service Bus topic, using JMS APIs. Our app is a Spring Boot server app. We use "passwordless" connections (AAD access tokens used underneath), app is configured exactly as explained in documentation. Aps is running on Azure Spring Apps - we don't have control over the system clock (so clocks misalignment can be ruled out). This seems to be a problem with AAD token refresh - it does not happen on time or is not applied correctly?

anuchandy commented 9 months ago

Hi @Netyyyy can someone from your team take a look?

Netyyyy commented 8 months ago

Hi @jarekhr ,Thank you for reporting this issue. We have received your submission and will take a look. We appreciate your input and will review this matter as soon as possible. Please feel free to provide any additional information or context that you think may be helpful. We'll keep you updated on the progress of our review.

Netyyyy commented 8 months ago

seems like https://github.com/Azure/azure-sdk-for-java/issues/31966#issuecomment-1924599040