I've recently had to look into replaying messages from Kafka topics to recover data, the simple option is to have a consumer start from a particular offset/partition using the seek() method. However when running at scale on a platform such as Kubernetes using seek() becomes a challenge as you are required to know the partition ahead of time when starting up.
I've looked at using assignments() to grab the current partition a consumer replica is listening on and then use this data with seek(), however I do worry that if a replica drops and the re-balancing takes place that this might result in an inconsistent state.
Is there a reliable way to tell a consumer group to start consuming from a specific offset without having to worry about the details of the partition (in some cases specific consumers will consume from multiple partitions when the replica count is lower than the partition count).
I've recently had to look into replaying messages from Kafka topics to recover data, the simple option is to have a consumer start from a particular offset/partition using the
seek()
method. However when running at scale on a platform such as Kubernetes usingseek()
becomes a challenge as you are required to know the partition ahead of time when starting up.I've looked at using
assignments()
to grab the current partition a consumer replica is listening on and then use this data withseek()
, however I do worry that if a replica drops and the re-balancing takes place that this might result in an inconsistent state.Is there a reliable way to tell a consumer group to start consuming from a specific offset without having to worry about the details of the partition (in some cases specific consumers will consume from multiple partitions when the replica count is lower than the partition count).