Farfetch / kafkaflow

Apache Kafka .NET Framework to create applications simple to use and extend.
https://farfetch.github.io/kafkaflow/
MIT License
649 stars 119 forks source link

[Bug Report]: KafkaFlow doesn't work with `cooperative-sticky` protocol #456

Open AlexeyRaga opened 1 year ago

AlexeyRaga commented 1 year ago

Prerequisites

Description

As discussed in this thread KafkaFlow doesn't behave correctly when the consumer is configured to use cooperative-sticky rebalancing protocol.

The protocol is described here

Steps to reproduce

Configure the consumer using WithConsumerConfig(...) and set the the strategy:

PartitionAssignmentStrategy = PartitionAssignmentStrategy.CooperativeSticky

Expected behavior

The consumer is expected to work as usual/normal, except that rebalancing would not cause "stop-the-world" behaviour.

Actual behavior

When KafkaFlow is configured to use PartitionAssignmentStrategy.CooperativeSticky the consumer seems to be working (processes messages), but does not commit any offsets.

KafkaFlow version

2.4.1

lpcouto commented 9 months ago

Hi @AlexeyRaga, I'm starting to look into this issue. As we no longer have access to the mentioned thread, can you give some more context, on what was said there? Also, have you updated Kafkaflow to version 3? If so, is this still an issue in that version?

AlexeyRaga commented 9 months ago

@lpcouto I equally have no access to the thread anymore, but I don't believe that anything useful was mentioned in there. As far as I can remember, there was something like "oh, it should work because an underlying librdkafka does" and then "oh, no, it doesn't indeed because we use the callbacks that are different in the cooperative mode". But not much more.

I haven't tried using cooperative-sticky mode in version 3, but I don't believe it'll work. It is extremely easy to test though, just set the mode to be CooperativeSticky and observe no offsets committed.

lpcouto commented 8 months ago

@AlexeyRaga When the cooperative-sticky rebalancing protocol is set, Kafka's response to the rebalance is different - we get the incremented partitions, and not the consumer's total list of partitions. This causes an unexpected behavior in Kafkaflow - it cannot commit offsets to partitions that it supposedly doesn't have. We are working on a solution and will let you know when done.