Open dipendra-singh opened 11 months ago
@ashie any update on this?
Is there any update on this? We've encountered the same issue when trying to output to an Azure Event Hub. Logs will flow for ~24 hours or so, then abruptly stop with the error below. Restarting the pods (we're running the operator in Kubernetes/AKS) will cause it to work again for a short time before the same errors occur.
2024-03-19 15:11:50 +0000 [warn]: #0 [clusterflow:cattle-logging-system:opt-in:clusteroutput:cattle-logging-system:kafka-rancher-logs] Send exception occurred: Failed to send messages to logging-rancher_logs-np/1, logging-rancher_logs-np/0
2024-03-19 15:11:50 +0000 [warn]: #0 [clusterflow:cattle-logging-system:opt-in:clusteroutput:cattle-logging-system:kafka-rancher-logs] Exception Backtrace : /usr/lib/ruby/gems/2.7.0/gems/fluent-plugin-kafka-0.17.5/lib/fluent/plugin/kafka_producer_ext.rb:242:in `deliver_messages_with_retries'
/usr/lib/ruby/gems/2.7.0/gems/fluent-plugin-kafka-0.17.5/lib/fluent/plugin/kafka_producer_ext.rb:128:in `deliver_messages'
/usr/lib/ruby/gems/2.7.0/gems/fluent-plugin-kafka-0.17.5/lib/fluent/plugin/out_kafka2.rb:299:in `write'
/usr/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:1179:in `try_flush'
/usr/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:1500:in `flush_thread_run'
/usr/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:499:in `block (2 levels) in start'
/usr/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2024-03-19 15:11:50 +0000 [info]: #0 [clusterflow:cattle-logging-system:opt-in:clusteroutput:cattle-logging-system:kafka-rancher-logs] initialized kafka producer: fluentd
2024-03-19 15:11:50 +0000 [warn]: #0 [clusterflow:cattle-logging-system:opt-in:clusteroutput:cattle-logging-system:kafka-rancher-logs] failed to flush the buffer. retry_times=0 next_retry_time=2024-03-19 15:11:51 +0000 chunk="61404ba7f31e469aa7afc435ed28b99b" error_class=Kafka::DeliveryFailed error="Failed to send messages to logging-rancher_logs-np/1, logging-rancher_logs-np/0"
2024-03-19 15:11:50 +0000 [warn]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluent-plugin-kafka-0.17.5/lib/fluent/plugin/kafka_producer_ext.rb:242:in `deliver_messages_with_retries'
2024-03-19 15:11:50 +0000 [warn]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluent-plugin-kafka-0.17.5/lib/fluent/plugin/kafka_producer_ext.rb:128:in `deliver_messages'
2024-03-19 15:11:50 +0000 [warn]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluent-plugin-kafka-0.17.5/lib/fluent/plugin/out_kafka2.rb:299:in `write'
2024-03-19 15:11:50 +0000 [warn]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:1179:in `try_flush'
2024-03-19 15:11:50 +0000 [warn]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:1500:in `flush_thread_run'
2024-03-19 15:11:50 +0000 [warn]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:499:in `block (2 levels) in start'
2024-03-19 15:11:50 +0000 [warn]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
@jodr5786 moving to kafka_buffered worked for me. But I am expecting a better solve here since kafka_buffered is deprecated. Also, when I used kafka_buffered I saw some fragmented chunk errors but they were innocuous. The data flows without any issue.
Describe the bug
When I am trying to use out_kafka2 and send logs to eventhub which has 200 partitions, I am facing this issue:
To Reproduce
Handling scale of about 80K logs/sec Eventhub partitions 200 with dedicated tier (1CU)
Expected behavior
Retries to be successful
Your Environment
Your Configuration
and I am using the following config:
Your Error Log
Additional context
No response