confluentinc / confluent-kafka-go

Confluent's Apache Kafka Golang client
Apache License 2.0
4.52k stars 649 forks source link

Duplicate messages consumed during rebalancing #1252

Closed dekke046 closed 1 week ago

dekke046 commented 1 week ago

Description

I am consuming (poll) messages (one by one) from a source topic and need to do some processing before I can commit the message. It looks like that if during the poll and before I commit, so still in the processing stage rebalancing takes place, by adding more consumers, a few messages are processed by two consumers. I am using cooperative rebalancing.

The main question is, with manual commits and time up to 500ms between consuming and commiting a message, is there any way to prevent this? Initially I was thinking there must be an option to use the rebalance callback for this, to hold-up the revocation process until the in-progress message is committed, but I cant find a way for it. I feel this is limiting my use case, since I cant have duplicate messages, I also cannot scale the number of consumers to more than one.

Am I overlooking some features here? Or is this really the way it is? Any suggestions welcome!

How to reproduce

Just use the cooperative consumer example, change it to manual commit and add a random delay for the commit. connect it to a topic with 5000 messages, scale op the number of consumers and there will be 6 duplicate messages in my case

Checklist

Please provide the following information:

milindl commented 1 week ago

Hopefully this example helps: https://github.com/confluentinc/confluent-kafka-go/blob/master/examples/consumer_rebalance_example/consumer_rebalance_example.go#L195

it's committing partitions during a revoke in the rebalance cb (it holds up the revoke as you're saying)

dekke046 commented 1 week ago

Thanks @milindl I was also able to fix it in a different way