Azure / azure-sdk-for-java

This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.25k stars 1.93k forks source link

Microsoft Azure Service Bus Session closed and not restored #37042

Open pavan-chakka opened 9 months ago

pavan-chakka commented 9 months ago

Library name and version

spring-cloud-azure-starter-servicebus-jms 4.7.0

Query/Question

Hello,

I am running into an issue where, listener session closing abruptly and connection not restored and no error code. Didn't find a good solution. below is exception log

javax.jms.IllegalStateException: The Session is closed
    at org.apache.qpid.jms.JmsSession.checkClosed(JmsSession.java:1101) ~[qpid-jms-client-0.53.0.jar!/:na]
    at org.apache.qpid.jms.JmsSession.rollback(JmsSession.java:263) ~[qpid-jms-client-0.53.0.jar!/:na]
    at org.messaginghub.pooled.jms.JmsPoolSession.close(JmsPoolSession.java:112) ~[pooled-jms-1.2.4.jar!/:na]
    at org.springframework.jms.support.JmsUtils.closeSession(JmsUtils.java:109) ~[spring-jms-5.3.27.jar!/:5.3.27]
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.clearResources(DefaultMessageListenerContainer.java:1289) ~[spring-jms-5.3.27.jar!/:5.3.27]
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1135) ~[spring-jms-5.3.27.jar!/:5.3.27]
    at java.base/java.lang.Thread.run(Unknown Source) ~[na:na]

2023-10-03 21:41:40.521  WARN 1 --- [ontainer#0-4019] o.s.j.l.DefaultMessageListenerContainer  : Setup of JMS message listener invoker failed for destination 'carton-status-update' - trying to recover. Cause: The MessageConsumer is closed
2023-10-03 21:41:40.525  WARN 1 --- [ontainer#0-4019] o.m.pooled.jms.JmsPoolSession            : Caught exception trying rollback() when putting session back into the pool, will invalidate. javax.jms.IllegalStateException: The Session is closed

javax.jms.IllegalStateException: The Session is closed
    at org.apache.qpid.jms.JmsSession.checkClosed(JmsSession.java:1101) ~[qpid-jms-client-0.53.0.jar!/:na]
    at org.apache.qpid.jms.JmsSession.rollback(JmsSession.java:263) ~[qpid-jms-client-0.53.0.jar!/:na]
    at org.messaginghub.pooled.jms.JmsPoolSession.close(JmsPoolSession.java:112) ~[pooled-jms-1.2.4.jar!/:na]
    at org.messaginghub.pooled.jms.JmsPoolConnection.cleanupAllLoanedSessions(JmsPoolConnection.java:269) ~[pooled-jms-1.2.4.jar!/:na]
    at org.messaginghub.pooled.jms.JmsPoolConnection.close(JmsPoolConnection.java:80) ~[pooled-jms-1.2.4.jar!/:na]
    at org.springframework.jms.connection.ConnectionFactoryUtils.releaseConnection(ConnectionFactoryUtils.java:81) ~[spring-jms-5.3.27.jar!/:5.3.27]
    at org.springframework.jms.listener.AbstractJmsListeningContainer.refreshSharedConnection(AbstractJmsListeningContainer.java:394) ~[spring-jms-5.3.27.jar!/:5.3.27]
    at org.springframework.jms.listener.DefaultMessageListenerContainer.refreshConnectionUntilSuccessful(DefaultMessageListenerContainer.java:985) ~[spring-jms-5.3.27.jar!/:5.3.27]
    at org.springframework.jms.listener.DefaultMessageListenerContainer.recoverAfterListenerSetupFailure(DefaultMessageListenerContainer.java:959) ~[spring-jms-5.3.27.jar!/:5.3.27]
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1146) ~[spring-jms-5.3.27.jar!/:5.3.27]
    at java.base/java.lang.Thread.run(Unknown Source) ~[na:na]`

Below is my JMS listener annotation in my spring boot service

@JmsListener(destination = TOPIC_NAME, containerFactory = "topicJmsListenerContainerFactory", subscription = SUBSCRIPTION_NAME, concurrency="15-20")

Session Restore is not happening only where concurrency parameter is configured

Please help if there is something to do to automatically restore the session when concurrency is configured.

Environment

Application is written in Java and is running in Azure Kubernetes Service Bus is in Azure

anuchandy commented 9 months ago

Hi @saragluna, could you please help with the triaging? the error seems to come from qpid-jms-client when concurrency > 1, unsure if this is Az SB Spring lib or the Apache client.

pavan-chakka commented 9 months ago

Thank you @anuchandy . Will look for an answer from @saragluna

saragluna commented 9 months ago

Hi @pavan-chakka, could you help provide the configuration files as well as the Service Bus settings?

pavan-chakka commented 8 months ago

Below is spring jms configuration from yaml files. I know in here I have max-concurrency as 2, but my '@JMSListner' annotation also have concurrency set 10-15

spring:
  jms:
    listener:
      max-concurrency: 2
    servicebus:
      connection-string: "Endpoint=${SERVICEBUS_ENDPOINT};SharedAccessKeyName=${SERVICEBUS_SHAREDACCESS_KEY_NAME};SharedAccessKey=${SERVICEBUS_SHAREDACCESS_KEY}"
      pricing-tier: premium
      listener:
        subscription-durable: true
        subscription-shared: true
      pool:
        enabled: true
        time-between-expiration-check: 5m

Service bus is configured pretty straight forward in azure portal. Please let me know what settings you want to specifically look from service bus

saragluna commented 8 months ago

@pavan-chakka thanks for the update, I will look into it.

moarychan commented 8 months ago

Hi @pavan-chakka , I tested with dependencies Spring Boot 2.7.9 + Spring Cloud Azure Service Bus JMS 4.7.0, and can't reproduce this exception, could you try to reproduce it locally? Are there any other conditions required? Better to provide a minimal project to easily reproduce the exception.

pavan-chakka commented 8 months ago

HI @moarychan , from local am unable to reproduce it, but when I deploy this application to Kubernetes with 2 instances - then this issue is happening after few days. When I deploy it to AKS, both instances will be successfully listening to messages, and process them, but after few days one of the pod will stop listening to messages with above exception, and then again after few days 2nd pod will also stop listening with above exception. There is no pattern that I have seen on timelines when it will stop listening. Again, when 1 pod throws that exception and stops listening for messages, then all traffic will be listened on 2nd pod and messages processed successfully, but when 2nd pod throws that exception and stops listening, then messages will get piled up in service bus.

moarychan commented 8 months ago

Hi @pavan-chakka , no actual update here, just a proposal for you to upgrade to using Spring Cloud Azure Service Bus JMS 5.5.0 + Pooled JMS 3.1.4, it may fix the exception since I found a similar fix on the latest Jakarta version 3.1.4.

pavan-chakka commented 8 months ago

ok, @moarychan , will try that. in near future, we are upgrading our spring versions. will keep you informed after upgrading and verifying it

hos-p commented 8 months ago

Hi @pavan-chakka , no actual update here, just a proposal for you to upgrade to using Spring Cloud Azure Service Bus JMS 5.5.0 + Pooled JMS 3.1.4, it may fix the exception since I found a similar fix on the latest Jakarta version 3.1.4.

Hi, we have probably same issue using Spring Cloud Azure Service Bus JMS 5.5.0 + Pooled JMS 3.1.4. So I think upgrade to pooled-jms 3.1.4 is not solution.

2023-10-24T00:45:53.190Z  WARN 1 --- [Container#0-111] o.s.j.l.DefaultMessageListenerContainer  : Execution of JMS message listener failed, and no ErrorHandler has been set.

jakarta.jms.IllegalStateException: The Session is closed
    at org.apache.qpid.jms.JmsSession.checkClosed(JmsSession.java:1113) ~[qpid-jms-client-2.4.0.jar!/:na]
    at org.apache.qpid.jms.JmsSession.getTransacted(JmsSession.java:213) ~[qpid-jms-client-2.4.0.jar!/:na]
    at org.messaginghub.pooled.jms.JmsPoolSession.getTransacted(JmsPoolSession.java:256) ~[pooled-jms-3.1.4.jar!/:na]
    at org.springframework.jms.listener.AbstractMessageListenerContainer.commitIfNecessary(AbstractMessageListenerContainer.java:772) ~[spring-jms-6.0.12.jar!/:6.0.12]
    at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:680) ~[spring-jms-6.0.12.jar!/:6.0.12]
    at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:330) ~[spring-jms-6.0.12.jar!/:6.0.12]
    at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:269) ~[spring-jms-6.0.12.jar!/:6.0.12]
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1239) ~[spring-jms-6.0.12.jar!/:6.0.12]
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1229) ~[spring-jms-6.0.12.jar!/:6.0.12]
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1122) ~[spring-jms-6.0.12.jar!/:6.0.12]
    at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]

2023-10-24T00:45:53.191Z  WARN 1 --- [Container#0-111] o.m.pooled.jms.JmsPoolSession            : Caught exception trying rollback() when putting session back into the pool, will invalidate. jakarta.jms.IllegalStateException: The Session is closed

jakarta.jms.IllegalStateException: The Session is closed
    at org.apache.qpid.jms.JmsSession.checkClosed(JmsSession.java:1113) ~[qpid-jms-client-2.4.0.jar!/:na]
    at org.apache.qpid.jms.JmsSession.rollback(JmsSession.java:264) ~[qpid-jms-client-2.4.0.jar!/:na]
    at org.messaginghub.pooled.jms.JmsPoolSession.close(JmsPoolSession.java:112) ~[pooled-jms-3.1.4.jar!/:na]
    at org.springframework.jms.support.JmsUtils.closeSession(JmsUtils.java:109) ~[spring-jms-6.0.12.jar!/:6.0.12]
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.clearResources(DefaultMessageListenerContainer.java:1291) ~[spring-jms-6.0.12.jar!/:6.0.12]
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1137) ~[spring-jms-6.0.12.jar!/:6.0.12]
    at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]

Threads:

"org.springframework.jms.JmsListenerEndpointContainer#0-111" - Thread t@178284
   java.lang.Thread.State: WAITING
    at java.base@17.0.2/java.lang.Object.wait(Native Method)
    - waiting on <6bb36940> (a org.apache.qpid.jms.provider.ConservativeProviderFuture)
    at java.base@17.0.2/java.lang.Object.wait(Object.java:338)
    at org.apache.qpid.jms.provider.ConservativeProviderFuture.sync(ConservativeProviderFuture.java:116)
    at org.apache.qpid.jms.JmsConnection.destroyResource(JmsConnection.java:756)
    at org.apache.qpid.jms.JmsConnection.destroyResource(JmsConnection.java:745)
    at org.apache.qpid.jms.JmsMessageConsumer.doClose(JmsMessageConsumer.java:173)
    at org.apache.qpid.jms.JmsMessageConsumer.close(JmsMessageConsumer.java:160)
    at org.messaginghub.pooled.jms.JmsPoolMessageConsumer.close(JmsPoolMessageConsumer.java:54)
    at org.messaginghub.pooled.jms.JmsPoolSession.close(JmsPoolSession.java:95)
    at org.messaginghub.pooled.jms.JmsPoolConnection.cleanupAllLoanedSessions(JmsPoolConnection.java:269)
    at org.messaginghub.pooled.jms.JmsPoolConnection.close(JmsPoolConnection.java:80)
    at org.springframework.jms.connection.ConnectionFactoryUtils.releaseConnection(ConnectionFactoryUtils.java:80)
    at org.springframework.jms.listener.AbstractJmsListeningContainer.refreshSharedConnection(AbstractJmsListeningContainer.java:395)
    - locked <dae254c> (a java.lang.Object)
    at org.springframework.jms.listener.DefaultMessageListenerContainer.refreshConnectionUntilSuccessful(DefaultMessageListenerContainer.java:986)
    at org.springframework.jms.listener.DefaultMessageListenerContainer.recoverAfterListenerSetupFailure(DefaultMessageListenerContainer.java:960)
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1148)
    - locked <3b93348d> (a java.lang.Object)
    at java.base@17.0.2/java.lang.Thread.run(Thread.java:833)

"org.springframework.jms.JmsListenerEndpointContainer#0-110" - Thread t@178282
   java.lang.Thread.State: BLOCKED
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.clearResources(DefaultMessageListenerContainer.java:1290)
    - waiting to lock <dae254c> (a java.lang.Object) owned by "org.springframework.jms.JmsListenerEndpointContainer#0-111" t@178284
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1137)
    at java.base@17.0.2/java.lang.Thread.run(Thread.java:833)

"org.springframework.jms.JmsListenerEndpointContainer#0-113" - Thread t@178286
   java.lang.Thread.State: BLOCKED
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.clearResources(DefaultMessageListenerContainer.java:1290)
    - waiting to lock <dae254c> (a java.lang.Object) owned by "org.springframework.jms.JmsListenerEndpointContainer#0-111" t@178284
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1137)
    at java.base@17.0.2/java.lang.Thread.run(Thread.java:833)

   Locked ownable synchronizers:
    - None

"org.springframework.jms.JmsListenerEndpointContainer#0-114" - Thread t@178287
   java.lang.Thread.State: BLOCKED
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.clearResources(DefaultMessageListenerContainer.java:1290)
    - waiting to lock <dae254c> (a java.lang.Object) owned by "org.springframework.jms.JmsListenerEndpointContainer#0-111" t@178284
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1137)
    at java.base@17.0.2/java.lang.Thread.run(Thread.java:833)

   Locked ownable synchronizers:
    - None
moarychan commented 8 months ago

Hi @pavan-chakka , because I haven't reproduced it yet, I need your help to confirm the following questions:

  1. If using the default concurrency configuration of the default JMS listener, the session-closed-and-not-restored problems will not occur.
  2. If the concurrency-related configurations of the JMS listener are configured, session-closed-and-not-restored problems will occur.
  3. Which tier is used for Service Bus and what queue level configuration changes are there?
  4. Are there any special network-related settings in K8S?
petercooperjr-spls commented 7 months ago

Hello! I'm one of @pavan-chakka's colleagues (and he's currently out on vacation). Thanks for looking into this. I do think that what we're running into is probably the issue noted earlier in the pooled-jms library. We're working on updating our systems to Spring Boot 2.7.17 or 3.1.5, which includes the patched pooled-jms version, and will let you know if the issue recurs.

To try to answer your questions, yes I think it's only the services that have a max-concurrency set, but the service runs for quite some time before no longer listening to messages so it's hard to know for sure exactly what's triggering it. We're using the Premium tier of Service Bus. And the network is probably locked down somewhat (that's part of why we needed to go with the Premium tier, so that we could do the networking restrictions we needed), but should be open between the AKS cluster and Service Bus as far as I understand it. (But I'm not the person who configures that, so if you need more information I might need to dig in a bit and contact other people at the company.)

Thank you!

raissi-oa commented 5 months ago

hello, any updates on this, we are facing the same issues, with latest 3.1.5 version of pooled-jms library

Netyyyy commented 4 months ago

hello, any updates on this, we are facing the same issues, with latest 3.1.5 version of pooled-jms library

Hi @raissi-oa , could you help provide more info about your situation? such as environment, when and how?

raissi-oa commented 4 months ago

Hello, For others facing the same issue, he what turns out to be on our case. When we enabled all logs for qpid, especially this class: org.apache.qpid.jms.provider.amqp.AmqpFixedProducer we found TRACE logs saying:

Outcome of delivery was rejected .... outcome=Rejected{error=Error{condition=amqp:resource-limit-exceeded, description='Cannot send more than 100 messages in a single transaction ....

Also this class: org.apache.qpid.jms.JmsLocalTransactionContext is logging (DEBUG) that it started a transaction 'New TX started: TX:ID:.....', and multiple messages: saying 'TX:TX:ID:.... has performed a send'. And at last there is a rollback of that transaction. In our case, this is because we use the same connection factory to consume messages and then send them others to ASB. Even though, we do not use explicit transactions, for some reason (under investigation) there is a transaction that gets opened, and used for the sending of all messages (can have more than a hundred). In our Spring config, we are using spring.jms.listener.acknowledge-mode=AUTO. As a temporary solution, we are creating separate connection factories for cosnuming and producing. But the principal way to investigate is to enable qpid trace logs.

Netyyyy commented 1 month ago

Hi @pavan-chakka , ServiceBus server side has a related fix recently for premium tier. So if you still meet this error in premium tier, please add environment property PN_TRACE_FRM=1 and provide the log info here, thanks.