Open Ap4uPNZ opened 5 years ago
the logs provided don't overlap in time (except for the 01:18 rebalance), and here they are consistent with each other (don't demonstrate a problem) - the second consumer has all partitions revoked, then half assigned, and the first consumer gets the other half assigned. Perhaps you mean you are not getting any logs over this period from the second consumer? That is perplexing, I don't know how that would happen. If you provide complete logs for all consumers in the group that is experiencing the problem over the same time period when setting Debug
to cgrp,fetch
, we'll be able to comment further.
Also, setting SessionTimeoutMs
to 6000 without reducing HeartbeatIntervalMs
is living dangerously I think - not much room for error, i'm not surprised you're seeing frequent rebalances. I'd leave SessionTimeoutMs
at the default unless you have very good reason to change it. If you reduce it to 6000, maybe reduce HeartbeatIntervalMs
to 1500 or something like that.
@mhowlett, yes it's complete logs for all consumers, second does not participate in rebalancing procedure, only one problem consumer. So I think it's not related to heartbeat and session timeout.
OK, I'll set Debug
to cgrp,fetch
and send you logs
There are logs for group of 3 consumers available by link. This "anomaly" has been reproduced for 2 consumers (hosts: b0fcec2
and 21b9c6
) twice about in 2019-10-29T10:09:22
and 2019-10-29T10:29:22
(UTC Timezone), consumer 9e9a95ac
has only two rebalaning, on starting and on stopping, and has no intermediate rebalancing.
Hi! Are there any news? Can I expect to get a resolution?
Hi all! I think I've faced with the similar problem. Do you have any updates here?
Workaround described in linked python issue worked for me. I have several different topics subscriptions on single consumer instance in my app. Setting at least one topic name as regexp in Subscribe()
(e.g. ^topic$
) method does the trick.
In answers to linked issue I saw that each 20m unsubscribe might be caused by librdkafka. For .net library is this the same? Is the bug caused by librdkafka or by .net wrapper itself? Are there any plans to do fixes?
Description
I have consumer group that consists of 2 or 3 consumer. The random consumer has received assignment and revoking of same partitions each 20 minutes, but another consumers in group have not received any rebalancing. So it is probably not the rebalancing caused by broker. My application shares data between instances and detect changes in consumer group by the rebalancing process and trigger some business actions, so this issue causes too much redundant processing, when group is not changed.
Probably there is same issue on python Also I've seen question and have checked timeouts, but this rebalancing happens each 20 minutes and there is not default timeouts that are related with this time
Confluent.Kafka 1.1.0 and also 1.0.0 Apache.Kafka 2.1.1 reproduced both on Windows 10 and Linux (CentOs)
How to reproduce
Just create simple consumer
and consume in infinite loop
the client configuration, other properties have default values
Logs for one of instances which have rebalanicng for 2 hour from first assigning.
Other instances have just true rebalancing logs
one oddity of my cluster: not all topics have a sufficient number of partitions