bsm / sarama-cluster

Cluster extensions for Sarama, the Go client library for Apache Kafka 0.9 [DEPRECATED]
MIT License
1.01k stars 222 forks source link

Understanding rebalance and parallel consumers payload processing #260

Closed misterion closed 6 years ago

misterion commented 6 years ago

Sorry for lame question here - im new to kafka and sarama — all my previous experiense was with AMQP and SNS. While testing groups in kafka withkafka-consumer-groupsil see expected results — running some consumers over some producers with partition i see that messages passed to all consumers.

Now try to get same results with sarama-cluster with this code:

config := cluster.NewConfig()
config.Consumer.Return.Errors = true
config.Consumer.Offsets.Initial = sarama.OffsetOldest
config.Group.Return.Notifications = true

consumer, err := cluster.NewConsumer(brokers, "test-group", topics, config)
consumer, err := NewConsumer()

for {
    select {
    case msg, ok := <-consumer.Messages():
        fmt.Fprintf(os.Stdout, "%s/%d/%d\t%s\t%s\n", msg.Topic, msg.Partition, msg.Offset, msg.Key, msg.Value)
      }
}

And producer code

config := sarama.NewConfig()
config.Producer.Partitioner = sarama.NewRandomPartitioner
config.Producer.RequiredAcks = sarama.WaitForAll
config.Producer.Return.Successes = true

producer, err := sarama.NewSyncProducer(brokers, config)
msg := &sarama.ProducerMessage{
    Topic:   "some-topic"  ,
    Partition: -1,
    Value:  sarama.StringEncoder(message),
}
producer.SendMessage(msg )

Now launching one procucers and some consumers the only one consumer got messages from producer. If shutdown this consumer one of other would receive messages but only one. Could you please help me with right documentation of samples may be to got that is wrong with this code?

dim commented 6 years ago

@misterion I think this might help to understand what ConsumerGroups are - https://blog.cloudera.com/blog/2018/05/scalability-of-kafka-messaging-using-consumer-groups/

TLDR: this idea is that you can launch multiple parallel consumers to consume a single topic and each consumer only consumes a subset of partitions. The maximum number of concurrent consumers must therefore be <= number of partitions and you can scale a topic horizontally by adding more partitions to it.

BTW, sarama-cluster is semi-deprecated, as I have been working on implementing the cluster feature directly into sarama - https://github.com/Shopify/sarama/pull/1099