Azure / azure-sdk-for-java

This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.31k stars 1.97k forks source link

[BUG] Non recoverable JMS connection caused by: java.lang.NullPointerException: null #29881

Closed adelinor closed 1 year ago

adelinor commented 2 years ago

Describe the bug Consumer implemented with spring-cloud-azure-starter-servicebus-jms loses the JMS connection and is not able to recover it.

Exception or Stack Trace The stack trace below shows that the connection is lost after or while processing the 31st message. Attempts to recover the connection can be seen with the message: Encountered a JMSException - resetting the underlying JMS Connection.

run_10011log.txt

On the 10th attempt the connection is not recovered and the consumer becomes stale. The only way to resume the consumption of messages is to stop the app and start it again.

To Reproduce 1) Create a premium Azure Service Bus 2) Load more than a thousand messages to a queue 3) Consume messages with a spring boot app which connects with the version 4.3.0 of the spring-cloud-azure-starter-servicebus-jms deployed in a network outside Azure.

Code Snippet This is not identifiable as logs only report a cause of null.

Expected behavior The expected behaviour is that the process for resetting the underlying JMS connection eventually succeeds.

Screenshots If applicable, add screenshots to help explain your problem.

Setup (please complete the following information):

I do not suspect a dependency version mismatch. I nevertheless provide the effective pom which includes all transitive dependencies: effective-pom.xml.txt

Additional context The capacity was increased at the Azure Service Bus level to 8 MU but it did not affect the behaviour of not recovering the JMS connection. It still failed after a few hundred messages were received.

Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

yiliuTo commented 2 years ago

@adelinor Thanks for reporting this. I tested your case with only spring-cloud-azure-starter-servicebus-jms and it can work. Given you are using Apache camel in your project, could you share a minimum project to show how you send/receive messages with Azure Service Bus via Camel, as well if you have any custom configuration/usage for Spring JMS?

adelinor commented 2 years ago

Dear @yiliuTo , Many thanks for coming back on this issue. The project I use is the following: camel-jms-calculator.zip

yiliuTo commented 2 years ago

Hi @adelinor , thanks for sharing your project. Could you also share your Service Bus service configuration? like below or something more if you have other custom configurations. image

And I noticed that you are using Weblogic as well as the WebLogicJtaTransactionManager for JMS transactions, however after reading your log information, I suppose there should be something wrong when a REMOTE_LINK_OPEN package gets recieved from Service Bus, which seems not realted with either Weblogic or transactions. Thus I tried to reproduce your issue with only camel jms (copied from your project) but no spring transactions, and running on Tomcat. But I haven't reproduced it now. Thouands of messages can be successfully received and consumed.

Thus besides for the Service Bus service configuration, could you help to test if you can get this error without Weblogic or transaction usage? This can help us to narrow down the scope.

adelinor commented 2 years ago

Hi @yiliuTo ,

The service bus was setup as follows:

The sample code attached earlier was updated to use the default transaction manager (data source based one):

1) JmsConfig.java

@Configuration
@EnableJms
@Slf4j
public class JmsConfig {

    /**
     * @param connectionFactory   Auto configured by spring boot
     * @param transactionManager  Required parameter
     * @return Camel JMS configuration
     */
    @Bean
    JmsConfiguration jmsConfiguration(
            ConnectionFactory connectionFactory,
            PlatformTransactionManager transactionManager) {
        JmsConfiguration result = new JmsConfiguration();
        result.setConnectionFactory(connectionFactory);

        if (transactionManager != null) {
            LOGGER.info("Setting transacted=true for the JMS configuration");
            result.setTransactionManager(transactionManager);
            result.setTransacted(true);
            // Default is CACHE_AUTO which should resolve to
            // CACHE_NONE in a transaction enabled environment but
            // defaulting it to avoid any doubt
            result.setCacheLevelName("CACHE_NONE");
        }
        return result;
    }

    /**
     * @param jmsConfiguration Required parameter
     * @return Camel JMS endpoint
     */
    @Bean
    JmsComponent jmsComponent(JmsConfiguration jmsConfiguration) {
        return JmsComponent.jmsComponent(jmsConfiguration);
    }
}

2) application.properties to disable jta:

spring.jta.enabled=false

5000 messages were pushed to the calculator/inbox queue. The same behaviour can be seen as per the last run where at some point the app becomes stale. This time the last exception is more explicit:

Caused by: org.apache.qpid.jms.provider.ProviderException: Local transactions cannot span multiple top-level entities such as queue or topic. Reference:c20722d3-b551-4ec5-8bde-43c598768983, TrackingId:c89f587b0000043a000522f362d11d74_G9S1_B5S2, SystemTracker:myqueues:Queue:calculator/inbox, Timestamp:2022-07-15T07:55:44 [condition = amqp:not-allowed]
    at org.apache.qpid.jms.provider.amqp.AmqpSupport.convertToNonFatalException(AmqpSupport.java:181) ~[qpid-jms-client-0.53.0.jar:na]
    at org.apache.qpid.jms.provider.amqp.AmqpFixedProducer.applyDeliveryStateUpdate(AmqpFixedProducer.java:252) ~[qpid-jms-client-0.53.0.jar:na]
    ... 28 common frames omitted

Please find logs for the entire execution: run_5000log.txt

yiliuTo commented 2 years ago

Hi @adelinor sorry for the late response, given the log of

Local transactions cannot span multiple top-level entities such as queue or topic.

Service Bus has some specific rules for cross-entity transactions, like it will create a session with specified name crossentity-coordinator for cross-entity transaction operations. Thus for other common transaction managers, it cannot meet this requirement of Service Bus.

However, I have tested with the native JMS transacted session with your sample code, and find that can work. I am connecting to MS Service Bus team internally to confirm this and will update to you if any. Would you like to try using the transacted session for camel JMS instead of other transaction managers?

adelinor commented 2 years ago

Hi @yiliuTo , as it seems difficult to reproduce the issue, it would be beneficial to add more logging in the SDK so that the cause of the problem can be better located.

This sample app consumes a message to perform operations in the database so the transaction manager needs to encompass database operations.

Kind regards

yiliuTo commented 2 years ago

Hi @adelinor ,

it would be beneficial to add more logging in the SDK so that the cause of the problem can be better located.

While considering that what the library spring-cloud-azure-starter-servicebus-jms does is to provide auto-configuration of caching/pooled ConnectionFactory as well as JmsListenerContainerFactory for org.apache.qpid:qpid-jms-client, but those factory implementation are used from the third-party library, like org.springframework:spring-jms and org.messaginghub:pooled-jms, thus we cannot modify the logs for those.

And as for the cause of the cross-entity transaction error, as mentioned in https://github.com/jmstoolbox/jmstoolbox/issues/137#issuecomment-1176209185, the support from Service Bus service side on cross-entity transaction of JMS specs is only for transacted session (locally transacted JMS session). And according to the underlying Spring JMS code, see JmsAccessor and DefaultJmsListenerContainerFactoryConfigurer, to set the transacted JMS session or transaction managers are mutually exclusive, thus I think that's the reason.

ghost commented 2 years ago

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

adelinor commented 2 years ago

Hi @yiliuTo ,

The JMSConfiguration is set as transacted. If the transaction configuration was wrong, no message would get processed. The observed behaviour is that the processing works as expected for a small number of messages (less than 100) and fails with the error reported in this issue with larger volumes.

The error reported, java.lang.NullPointerException: null, does not allow you to find the cause of the problem. Could you share the test suite you are using to validate the behaviour? I could try to execute from my machine and see how it behaves.

Kind regards

adelinor commented 2 years ago

Hi @yiliuTo,

I would like to share with you some further tests I have done on this issue. I did two runs where both start with 1000 messages in the inbox queue and with empty tables. The first run is done by setting the environment variable SPRING_JTA_ENABLED=false (which corresponds to the spring boot configuration spring.jta.enabled=false). When the app starts, the logs show that spring boot configures the JPA transaction manager the one configured from the datasource. Despite 35 occurrences of the exception reported in this issue, the processing goes on and ends when all 1000 messages are consumed. The problem is that only 997 records get saved in the database.

The second run is done by setting the environment variable SPRING_JTA_ENABLED=true . When the app starts, the logs show that spring boot configures the Weblogic JTA transaction manager. The app hangs after processing 174 messages. The database has 173 messages: i.e. data for one item is lost.

In both scenarios above I come to the conclusion that we do not have reliability with this setup: i.e. use of Azure Service Bus via JMS.

yiliuTo commented 2 years ago

Hi @adelinor , thanks for the detailed information. The test I used is a pure spring-jms application without camel jms since both camel-jms and Azure Service Bus JMS starter relys on the connection /listener container/transaction-management implementation from spring-jms, and here is the application I use. And for spring-jms, it by default configure the transacted session when no transaction manager is found, thus in my test I didn't specifically configure the transaction.

However, I found that when using different ways for cross-entity transaction of spring-jms, it behaves differently of using JmsTemplate or @SendTo annotation, when using @SendTo, it will encounter the same error of Local transactions cannot span multiple top-level entities such as queue or topic. as what you have observed with camel, while when using JmsTemplate, it works normally. Thus I am reading the spring-jms code to understand the reason, and also plan to compare spring-jms with pure Qpid JMS implemenation about the transacted-session to see if can get more insights. And communication about spring-jms transacted session is welcome.

adelinor commented 2 years ago

The article https://www.infoworld.com/article/2077963/distributed-transactions-in-spring--with-and-without-xa.html describes very well the options that for managing transactions with various transactional resources. That is the case of the camel-jms-calculator.

The option of not using JTA, i.e. using the default JPA transaction manager, is referred as the Wing-and-a-Prayer pattern:

Using JTA would guarantee the reliability but the Azure Service Bus may not support XA and this might explain why the applications hangs.

The XA and the Last Resource Gambit would be a fit for this scenario:

ezYakaEagle442 commented 2 years ago

@adelinor could you please try to reproduce the issue with Java 11? https://camel.apache.org/manual/what-are-the-dependencies.html#_java_support

https://camel.apache.org/manual/camel-3-migration-guide.html

Also your POM does not include camel-azure-servicebus , could you please try to reproduce with Camel latest version below ?: https://camel.apache.org/components/3.18.x/azure-servicebus-component.html

<dependency>
    <groupId>org.apache.camel</groupId>
    <artifactId>camel-azure-servicebus</artifactId>
    <version>x.x.x</version>
    <!-- use the same version as your Camel core version -->
</dependency>

https://camel.apache.org/components/3.18.x/jdbc-component.html#_endpoint_query_option_resetAutoCommit

Camel will set the autoCommit on the JDBC connection to be false, commit the change after executed the statement and reset the autoCommit flag of the connection at the end, if the resetAutoCommit is true. If the JDBC connection doesn’t support to reset the autoCommit flag, you can set the resetAutoCommit flag to be false, and Camel will not try to reset the autoCommit flag. When used with XA transactions you most likely need to set it to false so that the transaction manager is in charge of committing this tx.

ezYakaEagle442 commented 2 years ago

@adelinor also: Distributed transactions aren't supported, but transacted sessions are supported. https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-java-how-to-use-jms-api-amqp#unsupported-features-and-restrictions

adelinor commented 2 years ago

@ezYakaEagle442 ,

could you please try to reproduce the issue with Java 11?

The Weblogic Application server runs on Java 8. Nevertheless in order to try this option, I updated slightly the pom file to make the app runnable from Spring boot. Changes were to:

With JDK 11 (version "11.0.16" 2022-07-19 LTS), the same error as reported in this issue occurs but the application is not blocked and all messages end up in the out queue. The database though misses records: 18 out of 5000.

could you please try to reproduce with Camel latest version below ?

I changed further the pom to use the camel version 3.18.1, updated the camel spring boot starter is updated to use the new groupId name org.apache.camel.springboot, and finally added:

<dependency>
    <groupId>org.apache.camel</groupId>
    <artifactId>camel-azure-servicebus</artifactId>
    <version>${camel.version}</version>
</dependency>

With that setup, a batch of 5000 messages get consumed, No error appear in the logs. The database misses 15 records out of the 5000 processed.

fangjian0423 commented 1 year ago

hi @adelinor ,

We have a meeting internally to discuss this issue, and aware that finally the POC successfully.

Is there any other problems we need to help you?

ghost commented 1 year ago

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

adelinor commented 1 year ago

Dear @fangjian0423 ,

Indeed by creating separate transaction managers:

I was able to process successfully 100,00 messages without a loss or errors using a standard (i.e. not premium) Azure ServiceBus resource. Note that:

To conclude, this issue can be closed. As a take-away it would be beneficial to the community to have guidance on how to approach the use case of processing messages while performing a transaction with a database.

Kind regards

fangjian0423 commented 1 year ago

Thank you for your feedback and suggestion.

We will mention the transaction in our docs in the future.