Closed GiedriusS closed 1 year ago
Looking at ruby-kafka
source code https://github.com/zendesk/ruby-kafka/blob/292f2d5a5b8ef4572485617121bf61a6725414cf/lib/kafka/consumer.rb#L359, it looks like they are already triggering heartbeat after every batch, so I don't think this change will yield any behavioural differences.
From what you've described, I would be interested to look at your session.timeout.ms
(https://kafka.apache.org/documentation/#consumerconfigs_session.timeout.ms, which should default to 30s in ruby-kafka
, this can be set via the session_timeout
parameter), and see if it makes sense to bump that.
This PR has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this PR will be closed in 30 days
So, should we decline this PR and add descriptions for processing high volume log?
@cosmo0920 I think so. If the author finds a need for this again, they can re-open the PR in a future date
OK. Let's close this PR then.
We've spotted that processing a batch of Kafka messages in fluentd sometimes takes a long time and because of that Kafka consumer groups go into a constant rebalancing state. Fluentd lacks observability to understand what plugin is taking a long time so adding this as a stop gap solution for the moment. It's not trivial to add duration histograms to Fluentd so this is a stop-gap solution for now.
Kudos to @marijus-ravickas for the idea. More details available at https://www.marionzualo.com/2019/08/10/always-manually-send-heartbeats-when-processing-message-batches-in-ruby-kafka/.
Tests still pass: