confluentinc / confluent-kafka-go

Confluent's Apache Kafka Golang client
Apache License 2.0
4.5k stars 646 forks source link

Consumers unable to join group until group leader is restarted #1195

Open schmigware opened 1 month ago

schmigware commented 1 month ago

Description

Library version github.com/confluentinc/confluent-kafka-go/v2 v2.0.2 Kakfa broker version: 2.8.2 (Commit:3146c6ff4a24cc24)

Any input on the following would be greatly appreciated,

I have various consumers in consumer group FOO. The consumers are running on a number of k8s clusters and connecting to a common broker.

The consumers subscribe to various topics BAR0, BAR1, BAR2. Each of these topics has 32 partitions.

The observed behaviour is as follows:

This is the basic issue. Partitions don't appear to be assigned to any consumer.

Can be resolved by:

This does not appear to be an issue with partition assignment within a group. Each consumer is subscribed to an unrelated topic. We are namespacing our Kafka topics to match them 1:1 to k8s clusters and namespaces. Certainly, the "stuck" consumer and the consumer group leader are not subscribing to the same topic.

No errors or warnings are observed in broker / consumer group leader logs.

How to reproduce

Not clear to us how to to reproduce this issue.

milindl commented 1 month ago

Hi @schmigware , thanks for filing this issue. One question, does it happen every time, or does this happen intermittently? Your statement makes it seem that each consumer from this group is subscribed to different topics, without an intersection, is that correct?

I have a suspicion that maybe, the group leader does not have the metadata for topics it isn't subscribed to, and so is unable to assign them properly.

schmigware commented 1 month ago

Thanks for your reply @milindl. The issue appears to be intermittent. Indeed the various consumers are subscribed to different topics. The leader is not subscribed to the same topic as the "stuck" consumer.

schmigware commented 1 month ago

BTW a colleague appears to have opened a ticket with additional version information: https://github.com/confluentinc/confluent-kafka-go/issues/1197