Azure / azure-sdk-for-java

This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.33k stars 1.97k forks source link

[BUG] two listener outages on record, no root cause discovered #42040

Open jjfraney-cg opened 1 week ago

jjfraney-cg commented 1 week ago

Describe the bug A process listening to a topic stops receiving messages. We don't know the root cause or the time. Our customer's discover the problem first because the application begins to misbehave.

Restarting the process clears the problem. Once restarted, messages can be received.

How can I catch the error when it happens so I can restart the service before the customer notices?

How can I collect information that can narrow the problem description?

When azure service bus is upgraded, can we get an announcement?

We are using premium service bus.

Exception or Stack Trace None

To Reproduce Unknown.

Code Snippet Unknown.

Expected behavior Listener keeps getting messages without restarting the process.

Screenshots NA

Setup (please complete the following information):

Additional context Add any other context about the problem here.

Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

github-actions[bot] commented 1 week ago

@moarychan @netyyyy @rujche @saragluna

github-actions[bot] commented 1 week ago

Thank you for your feedback. Tagging and routing to the team member best able to assist.

saragluna commented 4 days ago

Thanks for reaching out, could you help provide a minimal project for us to reproduce this issue?

jjfraney-cg commented 2 days ago

I don't know how to reproduce the problem.

The jms clients we run are long term services. They run for weeks between scheduled maintenance restarts. We've been in production for 12 months. The problem had been noticed only twice.

We have about 10 topics. We observe that only one of them is affected.

We have a very light load at this time. We experience frequent idle connection closings. JMS is restarting connections successfully.

However, the connection closing exceptions are the only disruption reported into the logs by the jms layer.

We don't have a sample application to demonstrate the problem.