criteo / kafka-sharp

A C# Kafka driver
Apache License 2.0
110 stars 44 forks source link

How to fetch N messages at a given (random) offset? #7

Closed FrancoisBeaune closed 7 years ago

FrancoisBeaune commented 7 years ago

Hello,

Until now we were using kafka-rest to consume Kafka messages from our C# project. We're now investigating whether we would benefit by switching to kafka-sharp.

kafka-rest's API to consume messages is straightforward: you ask for (at most) N messages from a given offset via an HTTP GET request, and that request blocks until either N messages have been received or the end of the topic has been reached.

Naturally, it would make our life of switching to kafka-sharp easy if we could replicate that workflow, at least as a first step. Although we wouldn't immediately get all the benefits of a native Kafka driver (such as blocking when there is no more messages to consume in a topic), we would at least benefit on three fronts:

To test whether fetching N messages at random is possible and efficient with kafka-sharp, we wrote the following code:

var queue = new ConcurrentQueue<RawKafkaRecord>();
var completed = new AutoResetEvent(false);

cluster.MessageReceived += record =>
{
    queue.Enqueue(record);
    if (record.Offset == EndOffset)
    {
        completed.Set();
    }
};

var endOffset = beginOffset + messageCount - 1;

completed.Reset();
cluster.Consume(topic, partition, beginOffset);
cluster.StopConsume(topic, partition, endOffset);
completed.WaitOne(TimeSpan.FromMilliseconds(100));

RawKafkaRecord record;
while (queue.TryDequeue(out record))
{
    // Do something with the record.
}

Hopefully the idea is clear, but let's recap:

I also found out that I had to adjust (somewhat empirically) the following settings to get maximum performance:

Unfortunately the value of Configuration.FetchMessageMaxBytes depends on the number of messages I need to fetch. For 100-200 messages, 100 KB is nearly optimal.

My questions are:

  1. Is this a proper way to use kafka-sharp's API?
  2. Is this as efficient as it can be, given kafka-sharp's API and internals?

Thanks for the great library, and for your help.

sdanzan commented 7 years ago

Yes, it is currently the most efficient way to mimic the behaviour you're accustomed to.

FrancoisBeaune commented 7 years ago

Thanks for the confirmation.