Azure / azure-event-hubs-spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Apache License 2.0
233 stars 174 forks source link

EventHub Writer fails due to Throttling of EventHub, configuration settings have no impact. #679

Open steffenmarschall opened 1 year ago

steffenmarschall commented 1 year ago

Bug Report:

Actual Behavior

We have a rather huge streaming Dataframe (42.000.000 rows) which we want to send to our Azure Eventhub. The EventHub is scaled with 15 TUs. However any run trying to send this data fails, due to throttling of EventHub. The exception that is being shown is:

StreamingQueryException: [STREAM_FAILED] Query [id = ..., runId = ...] terminated with exception: Job aborted due to stage failure: Task XX in stage 9.0 failed 4 times, most recent failure: Lost task 61.3 in stage 9.0 (TID 1963) (10.179.0.21 executor 7): com.microsoft.azure.eventhubs.ServerBusyException: The request was terminated because the entity is being throttled. Error code : 50002. Sub error : 101. Please wait 4 seconds and try again. To know more visit https://aka.ms/sbResourceMgrExceptions and https://aka.ms/ServiceBusThrottling

We tried to lower the sending rate with the following options:

However none of these had any measureable impact on the Sending Rate to the EventHub.

Additional Info:

We stream from a DeltaTable, each version has usually ~42.000.000 added rows. We use the AvailableNow Trigger and try to checkpoint. However the job usually fails before reaching any checkpoint.

Expected behavior

Adjusting the settings will lower/increase throughput when writing to Azure Event Hub.

Please let us know on how to configure the EventHubWriter so we are able to send large data without failing due to throttling.

Configuration

YoshicoppensE61 commented 10 months ago

I have the same problem. I do not think we are armed with any config settings that can help in this scenario, to limit the rate of output, which makes eventhub as an output a bit worthless.