Closed Tom-Van-Asch closed 1 month ago
@anuchandy @conniey @lmolkova
Thank you for your feedback. Tagging and routing to the team member best able to assist.
Hello @Tom-Van-Asch, thank you for the report. Since your screenshot shows the breakpoint being hit, are you able to reproduce this locally in your developer machine? If so, could you provide more details
@Tom-Van-Asch, just to clarify, I mean if the local setup also hits the reported "NPE" (in addition to the onSessionRemoteClose
and closed-connection error message)
Hi @Tom-Van-Asch. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.
The same stack trace is discussed, and root caused in this thread https://github.com/Azure/azure-sdk-for-java/issues/41584
Thanks for the update, when the new version is released I'll test it again and update this issue with my findings.
We have the same issue without any possible solution to that
the release PR has been opened and will be shipped soon: https://github.com/Azure/azure-sdk-for-java/pull/42088
the release PR has been opened and will be shipped soon: #42088
Thank you Anu for the update. Looking forward to this update.
Hello @Tom-Van-Asch, @pretti-vusion, @amacbean, @josebarros2025, @hylander0, @padmapriyanalam, @pje477,
The library update has been released; please follow the steps outlined below. Let us know if the experience improves. Note that you will still see the session disconnect/reconnect logs (which is expected) but the new library should address the NullPointerException.
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-messaging-servicebus</artifactId>
<version>7.17.5</version>
</dependency>
When building any client (ServiceBusProcessorClient, ServiceBusReceiverClient, ServiceBusSenderClient etc..) use the configuration ("com.azure.core.amqp.cache"), as shown below. Make sure this configuration is selected for all the places where the application creates a new ServiceBusClientBuilder
-
new ServiceBusClientBuilder()
.connectionString(CONNECTION_STRING)
.configuration(new ConfigurationBuilder()
.putProperty("com.azure.core.amqp.cache", "true")
.build())
.processor()|sender()|..
Choosing this configuration is important to resolve the problem - java.lang.NullPointerException: Cannot invoke "java.util.List.add(Object)" because "this._sessions" is null
Make sure the transitive dependencies (azure-core-amqp, azure-core) are resolved to expected versions.
mvn dependency:tree
[INFO] ...
[INFO] +- com.azure:azure-messaging-servicebus:jar:7.17.5:compile
[INFO] | +- com.azure:azure-core:jar:1.53.0:compile
[INFO] | | +- ..
[INFO] | | \- ...
[INFO] | \- com.azure:azure-core-amqp:jar:2.9.10:compile
[INFO] | +- com.microsoft.azure:qpid-proton-j-extensions:jar:1.2.5:compile
[INFO] | \- org.apache.qpid:proton-j:jar:0.34.1:compile
Note: In later versions the need for opt-in "com.azure.core.amqp.cache" will be removed
Hello @pje477, I'm following up on this comment https://github.com/Azure/azure-sdk-for-java/issues/41584#issuecomment-2378284034 you left in the other GitHub issue.
While onRemoteSessionClose (normal) and NullPointerException (abnormal) had existed in the versions you listed, the "reactor-executor leak" is something new. Could you please try 7.17.4 steps listed in the above comment and check if it resolves the NPE and associated issues?
Few questions about your env where you were observing leak (It's fine to respond later, after trying 7.17.4) -
I'm having issues like these
java.lang.NoSuchMethodError: 'void com.azure.core.amqp.implementation.ReactorConnection.<init>(java.lang.String, com.azure.core.amqp.implementation.ConnectionOptions, com.azure.core.amqp.implementation.ReactorProvider, com.azure.core.amqp.implementation.ReactorHandlerProvider, com.azure.core.amqp.implementation.AmqpLinkProvider, com.azure.core.amqp.implementation.TokenManagerProvider, com.azure.core.amqp.implementation.MessageSerializer, org.apache.qpid.proton.amqp.transport.SenderSettleMode, org.apache.qpid.proton.amqp.transport.ReceiverSettleMode, boolean, boolean)'
any tips?
return new ServiceBusClientBuilder()
.connectionString(messagingConfiguration.getConnectionString())
.configuration(new ConfigurationBuilder()
.putProperty("com.azure.core.amqp.cache", "true")
.build())
.processor()
.queueName(queueName)
.maxConcurrentCalls(messagingConfiguration.getMaxConcurrentCall())
.receiveMode(ServiceBusReceiveMode.PEEK_LOCK)
.processMessage(processMessage())
.processError(processError())
.disableAutoComplete()
.buildProcessorClient();
this is spring boot 3.3.3 application
My dependency tree is different than yours, how should I proceed?
[INFO] +- com.azure.spring:spring-cloud-azure-starter:jar:5.14.0:compile
[INFO] | \- com.azure.spring:spring-cloud-azure-autoconfigure:jar:5.14.0:compile
[INFO] | \- com.azure.spring:spring-cloud-azure-service:jar:5.14.0:compile
[INFO] | \- com.azure.spring:spring-cloud-azure-core:jar:5.14.0:compile
[INFO] | \- com.azure:azure-core-management:jar:1.15.0:compile
[INFO] +- com.azure:azure-messaging-servicebus:jar:7.17.4:compile
[INFO] | +- com.azure:azure-core:jar:1.49.1:compile
[INFO] | | \- com.azure:azure-json:jar:1.1.0:compile
[INFO] | +- com.azure:azure-xml:jar:1.0.0:compile
[INFO] | +- com.azure:azure-core-amqp:jar:2.9.6:compile
[INFO] | | \- com.microsoft.azure:qpid-proton-j-extensions:jar:1.2.5:compile
[INFO] | \- com.azure:azure-core-http-netty:jar:1.15.1:compile
Hi @anuchandy - Thank you for releasing version 7.17.4 of Service Bus SDK - it appears to have fixed our issue!
Regarding the reactor-executor thread leak, what we observed is that in versions prior to 7.17.4, when the below error was logged for certain service bus namespaces, the reactor-executor thread associated to that connection would not be closed, and the number of reactor-executor threads would increase over time.
This was the error that always preceded the thread leak: reactor.core.Exceptions$ErrorCallbackNotImplemented: com.azure.core.amqp.exception.AmqpException: onSessionRemoteClose connectionId[MF_ff675e_1727378638933], entityName[sbt-my-topic-name] condition[Error{condition=amqp:connection:forced, description='The connection was closed by container '6b2b3bba82084087bfd7d760339cdade_G0' because it did not have any active links in the past 300000 milliseconds.
We observed this initially as slowly increasing CPU usage of our application, which is a Spring Boot microservice deployed to Azure Spring Apps. We ran a JFR on the Spring Apps instance and observed the high number of zombie reactor-executor threads. Then during troubleshooting we found that only certain service bus namespaces exhibit this behavior. For us, the only namespace that exhibits this behavior is our production instance, and none of our non-prod service bus namespaces exhibit the behavior, despite the SB namespaces being configured identically (via Terraform) and the same exact code being deployed in prod and non-prod.
In any case, here are the answers you requested:
String connectionString = "Endpoint=sb://" + hostname + ";SharedAccessKeyName=" + username + ";SharedAccessKey=" + password;
client = new ServiceBusClientBuilder()
.connectionString(connectionString)
.sender()
.queueName(queueTopic)
.buildClient();
client.createMessageBatch();
Then wait for 15 minutes. After 15 minutes (exactly), we get the error messages in the attached file. This issue only happens on our production service bus, but for that SB namespace, it happens every time the connection goes idle for 15m or more.
Again, the thread leak behavior appears to be remediated in version 7.17.4 of Service Bus SDK but I'm documenting this here for others.
Also - our use case is low-latency message transfer, so we open a connection to a database and another conenction to Service Bus and we keep these connections open for long periods of time. Then when any message is published from the database, it can be transferred to a Service Bus topic as quickly as possible (without the delay of opening the connection to Service Bus). There are periods of time when the message volume drops and there are no messages transferred in a 15-minute window, which is when we were seeing the behavior.
@josebarros2025, thank you for reaching out. I think the issue you're facing is because of spring-cloud-azure-starter:5.16.0 having those core versions (2.9.6, 1.49.1) as the transitive dependency, causing conflicts. The next version of spring-cloud-azure-starter bumping the core version is yet to be released.
Have you tried explicitly specifying the required versions of azure-[core|core-amqp] (in addition to azure-messaging-servicebus:7.14.4) in your Spring app POM above the spring-cloud-azure-starter dependency? Typically, in a Console app the first version found in the dependency chain will be used in case of conflict, I'm unsure if Spring dependency resolution behaves in a different way.
If you’re unable to override the versions in Spring app, then unfortunately, you'll need to wait for the azure spring team to release the spring-cloud-azure-starter that uses azure-messaging-servicebus:7.14.4. Generally, the azure spring team schedules releases after all the azure SDKs and BOM for that month are available. Given the previous release timelines, I would expect the releases by azure spring team to happen before mid of Oct.
Hello @pje477, that's wonderful news!. Thank you for confirming that 7.14.4 fixes NPE and leak. Also appreciate responding to my questions. It's unfortunate that, like your non-prod namespace, we (SDK Team) were also unable to repro any of this with our test namespaces. Therefore, your assistance in verifying version 7.14.4 was quite valuable.
Hi @anuchandy Thank you for the latest version. I have updated the dependencies to
and updated code to
final ServiceBusSenderClient senderClient = new ServiceBusClientBuilder()
.connectionString(CONNECTION_STRING)
.configuration(new ConfigurationBuilder()
.putProperty("com.azure.core.amqp.cache", "true")
.build())
.sender()
.queueName(queueName())
.buildClient();
Deployed the changes to demo environment. Still seeing the exception in demo logs
com.azure.core.amqp.exception.AmqpException: onSessionRemoteClose connectionId[connectionId], entityName[entityName] condition[Error{condition=amqp:connection:forced, description='The connection was closed by container 'containerId' because it did not have any active links in the past 300000 milliseconds. TrackingId:TrackingId, SystemTracker:gateway10, Timestamp:2024-10-02T12:26:06', info=null}], errorContext[NAMESPACE: servicebusName. ERROR CONTEXT: N/A, PATH: path]
Hi @anuchandy Thank you for the latest version. I have updated the dependencies to
and updated code to
final ServiceBusSenderClient senderClient = new ServiceBusClientBuilder() .connectionString(CONNECTION_STRING) .configuration(new ConfigurationBuilder() .putProperty("com.azure.core.amqp.cache", "true") .build()) .sender() .queueName(queueName()) .buildClient();
Deployed the changes to demo environment. Still seeing the exception in demo logs
com.azure.core.amqp.exception.AmqpException: onSessionRemoteClose connectionId[connectionId], entityName[entityName] condition[Error{condition=amqp:connection:forced, description='The connection was closed by container 'containerId' because it did not have any active links in the past 300000 milliseconds. TrackingId:TrackingId, SystemTracker:gateway10, Timestamp:2024-10-02T12:26:06', info=null}], errorContext[NAMESPACE: servicebusName. ERROR CONTEXT: N/A, PATH: path]
Hi we aren't having this issue anymore I had to force some dependencies otherwise won't work
private ServiceBusSenderAsyncClient sender;
protected AbstractMessageProducer(final String queueName, final MessagingConfiguration messagingConfiguration, final JacksonJsonSerializer jacksonJsonSerializer) {
log.debug("producer queueName: {}", queueName);
this.jacksonJsonSerializer = jacksonJsonSerializer;
this.messagingConfiguration = messagingConfiguration;
this.queueName = queueName;
sender = new ServiceBusClientBuilder()
.connectionString(messagingConfiguration.getConnectionString())
.configuration(new ConfigurationBuilder()
.putProperty("com.azure.core.amqp.cache", "true")
.build())
.sender()
.queueName(queueName)
.buildAsyncClient();
}
public void sendToQueue(final ServiceBusMessage message) {
sendMessage(message);
}
private void sendMessage(final ServiceBusMessage message) {
try {
sender.sendMessage(message)
.subscribe(
sentSignal -> log.info("Sent Message with ID {} to queue", message.getMessageId()),
errorSignal -> log.warn("Error signal: " + errorSignal),
() -> {
log.info("Send message completed");
});
} catch (Exception e) {
log.error("error sending message to queue: {}", e.getMessage(), e);
}
}
<azure.servicebus.version>7.17.4</azure.servicebus.version>
<dependency>
<groupId>com.azure.spring</groupId>
<artifactId>spring-cloud-azure-starter</artifactId>
<version>${azure.version}</version>
<exclusions>
<exclusion>
<groupId>com.azure</groupId>
<artifactId>azure-core</artifactId>
</exclusion>
<exclusion>
<groupId>com.azure</groupId>
<artifactId>azure-core-ampq</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-core</artifactId>
<version>1.52.0</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-core-amqp</artifactId>
<version>2.9.9</version>
<scope>compile</scope>
</dependency>
@josebarros2025, glad to hear you managed to solve the conflicts. Thanks for sharing the solution; I'm sure others will find it helpful.
@padmapriyanalam, thanks for the response. As I noted in my previous comment, we will still observe disconnect events. The service will disconnect if there is no activity, and the client will reconnect during the next send attempt. The log you’re seeing is such a disconnect event.
What 7.14.4 addresses is NullPointerException related to the session disconnect event and resulting thread leaks in certain environments. Hope this clarifies.
Closing this issue as 7.17.4 with the fix is released. Refer the steps outlined here to use 7.17.4
Thank you @josebarros2025 @anuchandy. We aren't having this issue anymore
@anuchandy I believe this issue still persists, even after updating the versions, we are still receiving the error
We are using the implementation: spring-cloud-azure-stream-binder-servicebus
Caused by: java.lang.NullPointerException: Cannot invoke "java.util.List.add(Object)" because "this._sessions" is null
at org.apache.qpid.proton.engine.impl.ConnectionImpl.session(ConnectionImpl.java:91)
at org.apache.qpid.proton.engine.impl.ConnectionImpl.session(ConnectionImpl.java:39)
at com.azure.core.amqp.implementation.ReactorConnection.lambda$createSession$15(ReactorConnection.java:342)
at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1708)
at com.azure.core.amqp.implementation.ReactorConnection.lambda$createSession$16(ReactorConnection.java:339)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:113)
at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.onNext(MonoPeekTerminal.java:180)
at reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.onNext(FluxHide.java:137)
<dependency>
<groupId>com.azure.spring</groupId>
<artifactId>spring-cloud-azure-dependencies</artifactId>
<version>5.17.1</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-messaging-servicebus</artifactId>
<version>7.17.4</version> <!-- {x-version-update;com.azure:azure-messaging-servicebus;dependency} -->
</dependency>
Hello @robsonkades, I think the issue is that the configuration "com.azure.core.amqp.cache" is not enabled. Since the application indirectly uses version 7.17.4 through the Spring library and cannot set it directly in the builder, you can set the system property "com.azure.core.amqp.cache" to true.
Describe the bug We are using the ServiceBusSenderClient to send messages to a service bus topic. When there is no message send for 15 minutes we see that the connection is freed but results in een NullPointerException afterwards as a new session is created on the already closed connection.
Exception or Stack Trace
To Reproduce
Code Snippet
Expected behavior No error logging should occur
Screenshots When the doFree method is called on the connection the _sessions variable is set to null but afterwards the connection seems to be reused to create a new session which results in the NullPointerException.
Setup (please complete the following information):