fluent / fluent-plugin-kafka

Kafka input and output plugin for Fluentd
Other
303 stars 176 forks source link

in_kafka_group: send heartbeat if necessary #465

Closed GiedriusS closed 1 year ago

GiedriusS commented 2 years ago

We've spotted that processing a batch of Kafka messages in fluentd sometimes takes a long time and because of that Kafka consumer groups go into a constant rebalancing state. Fluentd lacks observability to understand what plugin is taking a long time so adding this as a stop gap solution for the moment. It's not trivial to add duration histograms to Fluentd so this is a stop-gap solution for now.

Kudos to @marijus-ravickas for the idea. More details available at https://www.marionzualo.com/2019/08/10/always-manually-send-heartbeats-when-processing-message-batches-in-ruby-kafka/.

Tests still pass:

37 tests, 42 assertions, 0 failures, 0 errors, 0 pendings, 7 omissions, 0 notifications         
100% passed
raytung commented 2 years ago

Looking at ruby-kafka source code https://github.com/zendesk/ruby-kafka/blob/292f2d5a5b8ef4572485617121bf61a6725414cf/lib/kafka/consumer.rb#L359, it looks like they are already triggering heartbeat after every batch, so I don't think this change will yield any behavioural differences.

From what you've described, I would be interested to look at your session.timeout.ms (https://kafka.apache.org/documentation/#consumerconfigs_session.timeout.ms, which should default to 30s in ruby-kafka, this can be set via the session_timeout parameter), and see if it makes sense to bump that.

github-actions[bot] commented 1 year ago

This PR has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this PR will be closed in 30 days

cosmo0920 commented 1 year ago

So, should we decline this PR and add descriptions for processing high volume log?

raytung commented 1 year ago

@cosmo0920 I think so. If the author finds a need for this again, they can re-open the PR in a future date

cosmo0920 commented 1 year ago

OK. Let's close this PR then.