Azure / azure-sdk-for-java

This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.35k stars 1.99k forks source link

[BUG] EventhubReader stops working #39544

Open the-mod opened 7 months ago

the-mod commented 7 months ago

Describe the bug In our Scenario two Applications are reading all Messages from an Eventhub on different ConsumerGroups. But one Application (always the same one) irregularly stops reading from this Eventhub. For me it looks like PartitionPumps are dying one after the other, cause the Outgoing Messages [should be 2x incoming] going constantly downwards to the level of the Incoming Messages. See the Chart.

hsi-downtime-2-1

Strangely both Application are sharing the Eventhub Reader Implementation which is done via Event Processor Host.

In the Logs I was able to catch some Traces:

Did not observe any item or terminal signal within 60000ms in 'filter' (and no fallback has been configured)

Stacktrace:

java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 60000ms in 'filter' (and no fallback has been configured)
    at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.handleTimeout(FluxTimeout.java:296)
    at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.doTimeout(FluxTimeout.java:281)
    at reactor.core.publisher.FluxTimeout$TimeoutTimeoutSubscriber.onNext(FluxTimeout.java:420)
    at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.lambda$onNext$1(TracingSubscriber.java:64)
    at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.withActiveSpan(TracingSubscriber.java:100)
    at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.withActiveSpan(TracingSubscriber.java:91)
    at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext(TracingSubscriber.java:64)
    at reactor.core.publisher.FluxOnErrorReturn$ReturnSubscriber.onNext(FluxOnErrorReturn.java:162)
    at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.lambda$onNext$1(TracingSubscriber.java:64)
    at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.withActiveSpan(TracingSubscriber.java:100)
    at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.withActiveSpan(TracingSubscriber.java:91)
    at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext(TracingSubscriber.java:64)
    at reactor.core.publisher.MonoDelay$MonoDelayRunnable.propagateDelay(MonoDelay.java:270)
    at reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:285)
    at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)
    at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1583)

Turning on some debug logs for the com.azure.messaging and com.azure.core.amqp I found the following, but not sure if has something to do with the Issue:

[DEBUG] 2024-04-04T14:13:01,679 - reactor-executor-3 - com.azure.core.amqp.implementation.ReactorReceiver - {"az.sdk.message":"There are no credits to add.","connectionId":"MF_d66b18_1712238562943","entityPath":"eventhub-01/ConsumerGroups/cg/Partitions/3","linkName":"3_d8df1f_1712238562943","credits":"0"} 

I tested it with com.azure:azure-messaging-eventhubs:5.18.1 and com.azure:azure-messaging-eventhubs-checkpointstore-blob:1.19.1 as well as com.azure:azure-messaging-eventhubs:5.17.1 and com.azure:azure-messaging-eventhubs-checkpointstore-blob:1.18.1

I can provide the more Logs and SourceCode if needed.

github-actions[bot] commented 7 months ago

@anuchandy @conniey @lmolkova

github-actions[bot] commented 7 months ago

Thank you for your feedback. Tagging and routing to the team member best able to assist.

conniey commented 6 months ago

Hey @the-mod , Thanks for reporting this. I'm not sure where the stack trace is originating from because there's nothing about our code there. Can you provide some more logs around the time of this error in addition to the ReactorReceiver logs?

Cheers, Connie

github-actions[bot] commented 6 months ago

Hi @the-mod. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

github-actions[bot] commented 6 months ago

Hi @the-mod, we're sending this friendly reminder because we haven't heard back from you in 7 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

the-mod commented 6 months ago

@conniey sorry for the late reply. I will provide log traces via email. Thanks in Advance