Azure / azure-service-bus-java

☁️ Java client library for Azure Service Bus
https://azure.microsoft.com/services/service-bus
MIT License
60 stars 59 forks source link

"com.microsoft.azure.servicebus.primitives.TimeoutException: Send operation timed out at 2019-01-28T15:08:36.149+05:30[Asia/Kolkata]., errorContext[NS: xxx.servicebus.windows.net, PATH: xxxxxx-ba59-4957-9c92-xxxxxxxxx, REFERENCE_ID: 975311_33f8560710de40f9942328e2e993b68f_G18, LINK_CREDIT: 592]" #333

Open anurags123 opened 5 years ago

anurags123 commented 5 years ago

Focus on exceptions We are trying to send message to the topic in our application. While sending the message, we receive the below mentioned exception/error and the application goes down.

errorContext[NS: **-.servicebus.windows.net, PATH: -ba59-4957-9c92-*****, REFERENCE_ID: 195499_38d4306434c64134a9516e29698eb33f_G28, LINK_CREDIT: 640] at com.microsoft.azure.servicebus.primitives.CoreMessageSender.throwSenderTimeout(CoreMessageSender.java:887) ~[azure-servicebus-1.2.11.jar:?] at com.microsoft.azure.servicebus.primitives.CoreMessageSender.access$500(CoreMessageSender.java:59) ~[azure-servicebus-1.2.11.jar:?] at com.microsoft.azure.servicebus.primitives.CoreMessageSender$3.run(CoreMessageSender.java:260) ~[azure-servicebus-1.2.11.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_161] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_161] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_161] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[?:1.8.0_161] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_161] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_161] ... 1 more

Expected behavior Even if the timeout happened, after the application's retrial mechanism of 3 times, not all messages should be failed.

Observed behavior Timeout exception is thrown from azure SDK

Environment (please complete the following information):

yvgopal commented 5 years ago

Do you mean the SEND will never succeed after this? How long does your application run before you see these timeouts? Does your application also receive messages? If yes, can you paste the code you are using to receive messages? If you are registering a message handler, you should ideally pass your own ExecutorService. If you are not passing an ExecutorService, make that change and see if it works.

anurags123 commented 5 years ago

Hi,

Please see my responses inline:

  1. Do you mean the SEND will never succeed after this? - True. No recovery happens and we keep on getting these exceptions from SDK unless we restart the production server.

  2. How long does your application run before you see these timeouts? - It's an intermittent issue. Mostly we have to restart the servers once a day in order to recover from this.

  3. Does your application also receive messages? If yes, can you paste the code you are using to receive messages? - No our application doesn't read anything from topic subscriptions. We just do the "SEND" operation to topicClient.

  4. If you are registering a message handler, you should ideally pass your own ExecutorService. If you are not passing an ExecutorService, make that change and see if it works. - No we are not using any handlers. We just create topic connection and then call topicClient.send() method of azure SDK.

Can you please advise/help/debug this issue, as we see a lot of downtime because of this issue.

yvgopal commented 5 years ago

ok. Can you enable logging and capture logs? I need logs for 30 minutes ending at the time these timeout start, with at least INFO level logging. The SDK uses SLF4J, so you can any supported logging framework to capture logs.

yvgopal commented 5 years ago

I am still waiting for the logs. Is your issue gone now?

martinpaoloni commented 5 years ago

Hi! I am experiencing the same issue. I get the same error log when I try to send messages and I experience (or simulate) a network outage. Unfortunately, that message is lost and is never sent.

I even tried making a copy of the class RetryExponential, modifying it to enable retries on TimeoutException and used that as the RetryPolicy but that did not work unfortunately.

Please let me know if I can provide any other resources to have this fixed. Alternatively, I would appreciate some pointers/guidance so I can debug this myself and hopefully come up with a fix proposal.

Thanks in advance!

yvgopal commented 5 years ago

@martinpaoloni It has been a very old issue. We fixed this issue of not recovering from a network outage sometime back. Could you try with the latest SDK version, either 1.2.x latest or 2.0.0. RetryPolicy doesn't retry Timeout exceptions by default. Timeout is likely not a retriable exception. A timeout indicates the messages is either sent or not sent. A message is guaranteed to be sent only when your SEND call succeeds. If your issue still persists, enable DEBUG level logging and attach the captured logs to this issue. The SDK uses SLF4J, you can use SLF4J-Log4J bridge to capture logs using Log4J.

las3r commented 4 years ago

@yvgopal I'm sorry to resurrect this issue, but we're facing this issue with 2.0.0. A year back we've had issues when on 1.0.0, and then we were told to move over to 2.0.0.

If I look at the releases I can see that the 2.0.0 branch has been silent ever since (no releases), so what is our upgrade path now? Do we 'downgrade' to the latest 1.2.x which is actually newer and contains more bugfixes?

Are there any issues we can expect when downgrading?

yvgopal commented 4 years ago

I don't know if you are experiencing the same issue. Rather than spending time investigating why 2.0.0 has the issue, I suggest you upgrade to the latest version 3.1.4. The repo has moved to new location https://github.com/Azure/azure-sdk-for-java/tree/master/sdk/servicebus