Errors while sending data to Eventhub

Describe the bug

When I am trying to use out_kafka2 and send logs to eventhub which has 200 partitions, I am facing this issue:

2023-12-01 22:56:45 +0000 [warn]: #0 [kfk2] Send exception occurred: Failed to send messages to livelogs/192, livelogs/140, livelogs/148, livelogs/152, livelogs/22, livelogs/97, livelogs/114, livelogs/49, livelogs/174, livelogs/92, livelogs/26, livelogs/84, livelogs/64, livelogs/199, livelogs/68, livelogs/134, livelogs/30, livelogs/83, livelogs/8, livelogs/10, livelogs/158, livelogs/66, livelogs/71, livelogs/14, livelogs/94, livelogs/93, livelogs/117, livelogs/62, livelogs/89, livelogs/198, livelogs/40, livelogs/155, livelogs/180, livelogs/24, livelogs/179, livelogs/159, livelogs/184, livelogs/38, livelogs/141, livelogs/27, livelogs/46, livelogs/82, livelogs/53, livelogs/154, livelogs/16, livelogs/21, livelogs/34, livelogs/90, livelogs/181, livelogs/196, livelogs/79, livelogs/96, livelogs/23, livelogs/6, livelogs/182, livelogs/104, livelogs/183, livelogs/67, livelogs/124, livelogs/167, livelogs/43, livelogs/75, livelogs/156, livelogs/57, livelogs/102, livelogs/194, livelogs/130, livelogs/106, livelogs/32, livelogs/101, livelogs/170, livelogs/95, livelogs/146, livelogs/47, livelogs/103, livelogs/144, livelogs/176, livelogs/118, livelogs/185, livelogs/138, livelogs/9, livelogs/160, livelogs/41, livelogs/164, livelogs/166, livelogs/86, livelogs/28, livelogs/195, livelogs/127, livelogs/51, livelogs/61, livelogs/189, livelogs/54, livelogs/87, livelogs/139, livelogs/72, livelogs/7, livelogs/77, livelogs/123, livelogs/18, livelogs/50, livelogs/91, livelogs/5, livelogs/59, livelogs/132, livelogs/147, livelogs/110, livelogs/44, livelogs/136, livelogs/108, livelogs/173, livelogs/45, livelogs/151, livelogs/81, livelogs/172, livelogs/29, livelogs/131, livelogs/58, livelogs/197, livelogs/4, livelogs/178, livelogs/65, livelogs/129, livelogs/17, livelogs/98, livelogs/142, livelogs/137, livelogs/121, livelogs/150, livelogs/190, livelogs/116, livelogs/163, livelogs/162, livelogs/33, livelogs/69, livelogs/74, livelogs/187, livelogs/2, livelogs/35, livelogs/15, livelogs/42, livelogs/20, livelogs/171, livelogs/88, livelogs/126, livelogs/186, livelogs/36, livelogs/109, livelogs/193, livelogs/105, livelogs/149, livelogs/48, livelogs/3, livelogs/175, livelogs/63, livelogs/177, livelogs/115, livelogs/161, livelogs/122, livelogs/25, livelogs/11, livelogs/120, livelogs/31, livelogs/99, livelogs/70, livelogs/107, livelogs/73, livelogs/111, livelogs/1, livelogs/128, livelogs/76, livelogs/119, livelogs/165, livelogs/168, livelogs/78, livelogs/13, livelogs/12, livelogs/188, livelogs/191, livelogs/100, livelogs/112, livelogs/143, livelogs/80, livelogs/39, livelogs/153, livelogs/19, livelogs/60, livelogs/169, livelogs/52, livelogs/0, livelogs/56, livelogs/145, livelogs/125, livelogs/135, livelogs/113, livelogs/133, livelogs/157, livelogs/85, livelogs/55
2023-12-01 22:56:45 +0000 [warn]: #0 [kfk2] Exception Backtrace : /fluentd/vendor/bundle/ruby/3.1.0/gems/fluent-plugin-kafka-0.18.1/lib/fluent/plugin/kafka_producer_ext.rb:243:in `deliver_messages_with_retries'
/fluentd/vendor/bundle/ruby/3.1.0/gems/fluent-plugin-kafka-0.18.1/lib/fluent/plugin/kafka_producer_ext.rb:129:in `deliver_messages'
/fluentd/vendor/bundle/ruby/3.1.0/gems/fluent-plugin-kafka-0.18.1/lib/fluent/plugin/out_kafka2.rb:377:in `write'
/fluentd/vendor/bundle/ruby/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin/output.rb:1225:in `try_flush'
/fluentd/vendor/bundle/ruby/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin/output.rb:1538:in `flush_thread_run'
/fluentd/vendor/bundle/ruby/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin/output.rb:510:in `block (2 levels) in start'
/fluentd/vendor/bundle/ruby/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

To Reproduce

Handling scale of about 80K logs/sec Eventhub partitions 200 with dedicated tier (1CU)

Expected behavior

Retries to be successful

Your Environment

- Fluentd version: v1.16.1
- TD Agent version:
- fluent-plugin-kafka version: 1.2
- ruby-kafka version: v1.51
- Operating system: ubuntu
- Kernel version:

Your Configuration

and I am using the following config:

 <store>
        @type "kafka2"
        @id kfk2
        @log_level "debug"
        get_kafka_client_log true
        brokers xyz.servicebus.windows.net:9093
        use_event_time true
        default_topic "livelogs"
        max_send_retries 3
        required_acks -1
        idempotent true
        ssl_ca_certs_from_system true
        ssl_ca_cert 
        ack_timeout 10
        username "$ConnectionString"
        password xxxxxx
        <inject>
          time_key "time"
          time_type string
          time_format "%Y-%m-%dT%H:%M:%S"
        </inject>
        <buffer>
          flush_mode interval
          retry_type exponential_backoff
          @type "file"
          chunk_limit_size 1MB
          overflow_action drop_oldest_chunk
          flush_thread_count 16
        </buffer>
        <format>
          @type "json"
        </format>
      </store>

Your Error Log

2023-12-01 22:56:45 +0000 [warn]: #0 [kfk2] Exception Backtrace : /fluentd/vendor/bundle/ruby/3.1.0/gems/fluent-plugin-kafka-0.18.1/lib/fluent/plugin/kafka_producer_ext.rb:243:in `deliver_messages_with_retries'
/fluentd/vendor/bundle/ruby/3.1.0/gems/fluent-plugin-kafka-0.18.1/lib/fluent/plugin/kafka_producer_ext.rb:129:in `deliver_messages'
/fluentd/vendor/bundle/ruby/3.1.0/gems/fluent-plugin-kafka-0.18.1/lib/fluent/plugin/out_kafka2.rb:377:in `write'
/fluentd/vendor/bundle/ruby/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin/output.rb:1225:in `try_flush'
/fluentd/vendor/bundle/ruby/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin/output.rb:1538:in `flush_thread_run'
/fluentd/vendor/bundle/ruby/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin/output.rb:510:in `block (2 levels) in start'
/fluentd/vendor/bundle/ruby/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'`

Additional context

No response

Is there any update on this? We've encountered the same issue when trying to output to an Azure Event Hub. Logs will flow for ~24 hours or so, then abruptly stop with the error below. Restarting the pods (we're running the operator in Kubernetes/AKS) will cause it to work again for a short time before the same errors occur.

2024-03-19 15:11:50 +0000 [warn]: #0 [clusterflow:cattle-logging-system:opt-in:clusteroutput:cattle-logging-system:kafka-rancher-logs] Send exception occurred: Failed to send messages to logging-rancher_logs-np/1, logging-rancher_logs-np/0
2024-03-19 15:11:50 +0000 [warn]: #0 [clusterflow:cattle-logging-system:opt-in:clusteroutput:cattle-logging-system:kafka-rancher-logs] Exception Backtrace : /usr/lib/ruby/gems/2.7.0/gems/fluent-plugin-kafka-0.17.5/lib/fluent/plugin/kafka_producer_ext.rb:242:in `deliver_messages_with_retries'
/usr/lib/ruby/gems/2.7.0/gems/fluent-plugin-kafka-0.17.5/lib/fluent/plugin/kafka_producer_ext.rb:128:in `deliver_messages'
/usr/lib/ruby/gems/2.7.0/gems/fluent-plugin-kafka-0.17.5/lib/fluent/plugin/out_kafka2.rb:299:in `write'
/usr/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:1179:in `try_flush'
/usr/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:1500:in `flush_thread_run'
/usr/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:499:in `block (2 levels) in start'
/usr/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2024-03-19 15:11:50 +0000 [info]: #0 [clusterflow:cattle-logging-system:opt-in:clusteroutput:cattle-logging-system:kafka-rancher-logs] initialized kafka producer: fluentd
2024-03-19 15:11:50 +0000 [warn]: #0 [clusterflow:cattle-logging-system:opt-in:clusteroutput:cattle-logging-system:kafka-rancher-logs] failed to flush the buffer. retry_times=0 next_retry_time=2024-03-19 15:11:51 +0000 chunk="61404ba7f31e469aa7afc435ed28b99b" error_class=Kafka::DeliveryFailed error="Failed to send messages to logging-rancher_logs-np/1, logging-rancher_logs-np/0"
  2024-03-19 15:11:50 +0000 [warn]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluent-plugin-kafka-0.17.5/lib/fluent/plugin/kafka_producer_ext.rb:242:in `deliver_messages_with_retries'
  2024-03-19 15:11:50 +0000 [warn]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluent-plugin-kafka-0.17.5/lib/fluent/plugin/kafka_producer_ext.rb:128:in `deliver_messages'
  2024-03-19 15:11:50 +0000 [warn]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluent-plugin-kafka-0.17.5/lib/fluent/plugin/out_kafka2.rb:299:in `write'
  2024-03-19 15:11:50 +0000 [warn]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:1179:in `try_flush'
  2024-03-19 15:11:50 +0000 [warn]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:1500:in `flush_thread_run'
  2024-03-19 15:11:50 +0000 [warn]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:499:in `block (2 levels) in start'
  2024-03-19 15:11:50 +0000 [warn]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

fluent / fluent-plugin-kafka