IBM / sarama

Sarama is a Go library for Apache Kafka.
MIT License
11.5k stars 1.75k forks source link

unknown generation error while consuming using consumer group #1310

Closed varun06 closed 5 years ago

varun06 commented 5 years ago
Versions

Sarama Version:1.21.0 Kafka Version: 1.1.0 Go Version:1.12

Configuration
        cfg.Consumer.Group.Session.Timeout = 20 * time.Second
        cfg.Consumer.Group.Heartbeat.Interval = 6 * time.Second
Logs
ERRO[0051] kafka: error while consuming <topic-name>: kafka server: The provided member is not known in the current generation.  
Problem Description

We have a library that provide some abstraction over sarama consumer group. while using the library I see lots of above ^^ errors. I have already looked at session timeout values, but no help. @dim can you please help me understand this error and point towards some steps?

varun06 commented 5 years ago

@bai have you seen this error ever?

bai commented 5 years ago

Possibly related? https://github.com/bsm/sarama-cluster/issues/29

varun06 commented 5 years ago

Thanks Bai, that's helpful.

varun06 commented 5 years ago

Hey @dim I think I need your help here. I have tried bunch things but still getting below error in sarama logs.

ERRO[0098] kafka: error while consuming ewr.kessel-run.mt-raw.1/8: kafka server: The provided member is not known in the current generation.  offset=-3 partition=-1 topic=unknown type=kafka
ERRO[0098] kafka: error while consuming ewr.kessel-run.mt-raw.1/1: kafka server: The provided member is not known in the current generation.  offset=-3 partition=-1 topic=unknown type=kafka
ERRO[0098] kafka: error while consuming ewr.kessel-run.mt-raw.1/4: kafka server: The provided member is not known in the current generation.  offset=-3 partition=-1 topic=unknown type=kafka

Here is my consumer side config -

        cfg.Consumer.Group.Session.Timeout = 20 * time.Second
        cfg.Consumer.Group.Heartbeat.Interval = 6 * time.Second
        cfg.Consumer.MaxProcessingTime = 500 * time.Millisecond

redacted logs from kafka side -

[2019-03-19 20:22:40,044] INFO [GroupCoordinator 8]: Preparing to rebalance group test-krm-local with old generation 53 (__consumer_offsets-29) (kafka.coordinator.group.GroupCoordinator)
[2019-03-19 20:22:43,045] INFO [GroupCoordinator 8]: Stabilized group test-krm-local generation 54 (__consumer_offsets-29) (kafka.coordinator.group.GroupCoordinator)
[2019-03-19 20:22:43,224] INFO [GroupCoordinator 8]: Assignment received from leader for group test-krm-local for generation 54 (kafka.coordinator.group.GroupCoordinator)
[2019-03-19 20:23:05,266] INFO [GroupCoordinator 8]: Member kessel-run-mirror-06ae81c4-a739-4ec2-8b94-3656a5ea831e in group test-krm-local has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-03-19 20:23:05,266] INFO [GroupCoordinator 8]: Preparing to rebalance group test-krm-local with old generation 54 (__consumer_offsets-29) (kafka.coordinator.group.GroupCoordinator)
[2019-03-19 20:23:05,266] INFO [GroupCoordinator 8]: Group test-krm-local with generation 55 is now empty (__consumer_offsets-29) (kafka.coordinator.group.GroupCoordinator)
[2019-03-19 20:23:18,314] INFO [GroupCoordinator 8]: Preparing to rebalance group test-krm-local with old generation 55 (__consumer_offsets-29) (kafka.coordinator.group.GroupCoordinator)
[2019-03-19 20:23:21,316] INFO [GroupCoordinator 8]: Stabilized group test-krm-local generation 56 (__consumer_offsets-29) (kafka.coordinator.group.GroupCoordinator)
[2019-03-19 20:23:21,562] INFO [GroupCoordinator 8]: Assignment received from leader for group test-krm-local for generation 56 (kafka.coordinator.group.GroupCoordinator)
[2019-03-19 20:23:31,600] INFO [GroupCoordinator 8]: Member kessel-run-mirror-6983da00-2fb0-46c4-a3ae-a9852b2741b6 in group test-krm-local has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-03-19 20:23:31,600] INFO [GroupCoordinator 8]: Preparing to rebalance group test-krm-local with old generation 56 (__consumer_offsets-29) (kafka.coordinator.group.GroupCoordinator)
[2019-03-19 20:23:31,600] INFO [GroupCoordinator 8]: Group test-krm-local with generation 57 is now empty (__consumer_offsets-29) (kafka.coordinator.group.GroupCoordinator)
dim commented 5 years ago

@varun06 sorry, I have been slightly distracted recently :) My apologies, but I am not 100% sure how to help. Generation errors only happen when a member tries to commit offsets after the session has been closed server side. I assume you are not stopping your consume loop quickly enough after a rebalance is triggered. The broker is then starting a new session with a new generation and giving up on the previous one. I would try to increase the Session.Timeout and see if that makes a difference.

varun06 commented 5 years ago

Thanks @dim I have been playing with timeouts and they have helped, so errors are very sporadic now, I am sure it is the way we committing the offsets, we commit them in batch and that code has some oddities as you mentioned.

1995parham commented 5 years ago

@varun06 can you please describe and share your timeout values with us? We have the same problem with Sarama.

varun06 commented 5 years ago

@1995parham

cfg.Consumer.Group.Session.Timeout = 20 * time.Second
cfg.Consumer.Group.Heartbeat.Interval = 6 * time.Second
cfg.Consumer.MaxProcessingTime = 500 * time.Millisecond