Azure / azure-event-hubs-spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Apache License 2.0
233 stars 174 forks source link

DBR 7.x connector timing out on batch write after upgrade from DBR 6.x connector #553

Open demichalk opened 3 years ago

demichalk commented 3 years ago

Bug Report:

-Large batch writes to Event Hubs are timing out on current DBR 7.x connector (com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.17)

eh_write_conf = { 'eventhubs.connectionString' : eh_connection_encrypted, 'eventhubs.operationTimeout' : datetime.time(0,15,0).strftime("PT%HH%MM%SS") # 15 minute timeout }

( eh_green_avro_df .write .format("eventhubs") .options(**eh_write_conf) .save() )

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 13.0 failed 4 times, most recent failure: Lost task 0.3 in stage 13.0 (TID 235, 10.139.64.11, executor 2): com.microsoft.azure.eventhubs.TimeoutException: Entity(taxidemo): Send operation timed out at 2020-11-02T17:06:06.443Z[Etc/UTC]., errorContext[NS: fieldengdeveastus2ehb.servicebus.windows.net, PATH: taxidemo, REFERENCE_ID: 7e37e193523f4b8090b9a88ba8a6c122_G14, LINK_CREDIT: 300] at com.microsoft.azure.eventhubs.impl.MessageSender.throwSenderTimeout(MessageSender.java:947) at com.microsoft.azure.eventhubs.impl.MessageSender.access$1800(MessageSender.java:62) at com.microsoft.azure.eventhubs.impl.MessageSender$SendTimeout.run(MessageSender.java:1069) at com.microsoft.azure.eventhubs.impl.Timer$ScheduledTask.onEvent(Timer.java:48) at com.microsoft.azure.eventhubs.impl.DispatchHandler.onTimerTask(DispatchHandler.java:12) at org.apache.qpid.proton.engine.BaseHandler.handle(BaseHandler.java:233) at org.apache.qpid.proton.engine.impl.EventImpl.dispatch(EventImpl.java:108) at org.apache.qpid.proton.reactor.impl.ReactorImpl.dispatch(ReactorImpl.java:324) at org.apache.qpid.proton.reactor.impl.ReactorImpl.process(ReactorImpl.java:291) at com.microsoft.azure.eventhubs.impl.MessagingFactory$RunReactor.run(MessagingFactory.java:766) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2519) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2466) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2460) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2460) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1152) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1152) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1152) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2721) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2668) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2656) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:938) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2331) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2352) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2371) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2396) at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$1(RDD.scala:1001) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:125) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:395) at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:999) at org.apache.spark.sql.eventhubs.EventHubsWriter$.write(EventHubsWriter.scala:70) at org.apache.spark.sql.eventhubs.EventHubsSourceProvider.createRelation(EventHubsSourceProvider.scala:122) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:91) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:195) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$3(SparkPlan.scala:247) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:187) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:153) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:152) at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:999) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$5(SQLExecution.scala:116) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:248) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:101) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:836) at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:77) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:198) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:999) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:437) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:421) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748) Caused by: com.microsoft.azure.eventhubs.TimeoutException: Entity(taxidemo): Send operation timed out at 2020-11-02T17:06:06.443Z[Etc/UTC]., errorContext[NS: fieldengdeveastus2ehb.servicebus.windows.net, PATH: taxidemo, REFERENCE_ID: 7e37e193523f4b8090b9a88ba8a6c122_G14, LINK_CREDIT: 300] at com.microsoft.azure.eventhubs.impl.MessageSender.throwSenderTimeout(MessageSender.java:947) at com.microsoft.azure.eventhubs.impl.MessageSender.access$1800(MessageSender.java:62) at com.microsoft.azure.eventhubs.impl.MessageSender$SendTimeout.run(MessageSender.java:1069) at com.microsoft.azure.eventhubs.impl.Timer$ScheduledTask.onEvent(Timer.java:48) at com.microsoft.azure.eventhubs.impl.DispatchHandler.onTimerTask(DispatchHandler.java:12) at org.apache.qpid.proton.engine.BaseHandler.handle(BaseHandler.java:233) at org.apache.qpid.proton.engine.impl.EventImpl.dispatch(EventImpl.java:108) at org.apache.qpid.proton.reactor.impl.ReactorImpl.dispatch(ReactorImpl.java:324) at org.apache.qpid.proton.reactor.impl.ReactorImpl.process(ReactorImpl.java:291) at com.microsoft.azure.eventhubs.impl.MessagingFactory$RunReactor.run(MessagingFactory.java:766) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more

BrendBraeckmans commented 2 years ago

I'm running into the same issue with the latest version (azure-eventhubs-spark_2.12:2.3.21) of the connector ...

isinghrana commented 2 years ago

I am running into similar issue as well with azure-eventhubs-spark_2.12:2.23.22 and the DBR version 9.1 LTS