confluentinc / confluent-kafka-dotnet

Confluent's Apache Kafka .NET client
https://github.com/confluentinc/confluent-kafka-dotnet/wiki
Apache License 2.0
61 stars 861 forks source link

[Question] High CPU utilization in Consumer App #1261

Closed MaximKojineMojio closed 3 months ago

MaximKojineMojio commented 4 years ago

We implemented Consumer apps as a set of Azure Service Fabric services. Each app consumes messages only from one topic. And typical Kafka Consumer app has an infinite loop as its heart. Here is the pseudo-code for it:

public static void Main()
{
    [Initialize]
    consumer.Subscribe("my-topic");
    while (true)
    {
      var result = consumer.Consume();
      [Transform Result]
      producer.Produce(transformedResult); // Publish to another topic
    }
}

And the application above is trying to consume up to 50% of CPU.

Issue: When we deploy such service as a part of Azure Service Fabric Application then there could be a few consumer services running on the same node. And CPU usage would be ~100%.

We re-wrote this infinite loop

public static async Task Main()
{
    [Initialize]
    consumer.Subscribe("my-topic");
    while (true)
    {
      var result = consumer.Consume();
      [Transform Result]
      producer.Produce(transformedResult); // Publish to another topic

      //  Give chance to run other threads
      await Taks.Delay(1ms);  // We probably should use Task.Yield() or Task.Delay(0)
    }
}

And the app consumes less CPU, but a number of messages per second suffer.

Questions: Is there a deployment recommendation for Kafka consumer applications? One physical node - one consumer? Or is there a way to make a consumer application asynchronous without sacrificing performance?

mhowlett commented 4 years ago

it looks like you're trying to push the clients as hard as possible - high CPU is expected. IIRC a single client typically won't be able to saturate a 10gbps network (CPU will be the bottleneck), so you could have more than one client on such a machine if some cores aren't at full capacity, but network will probably be the bottleneck with a 1gbps network (though CPU will still be high). Details matter, e.g. is the data compressed (that is costly)? how many partitions are you consuming from and what is their distribution across brokers (there is one thread per broker, so that will distribute load across threads to some degree).

MaximKojineMojio commented 4 years ago

Many thanks for the prompt reply. We use probably the most CPU expensive data compression method - gzip. And we process 1-2 partitions per instance of our application running in our cluster (we have only 32 partitions and 3 brokers). The total load could be up to 5000msg/sec. But the current load is a few hundred messages per second.

Our client consumes messages from one topic, transforms it, and publishes to another topic - no lookups, just message transformation.

If the infinite loop is CPU expensive, what would be our options (we would prefer to use .NET Core)? All Kafka examples (and Kafka: The Definitive Guide) uses while(true) loop to process messages. Our other client (Azure EventHub) uses the same loop, but it doesn't experience any CPU issues.

while (NOT isCancelled)
{   
    var batch  = await GetBatchOfMessagesFromEventHubAsync();   
    if (batch is Empty)     
       await TaskDelay(time);   
    else      
       async ProcessBatchAsync(batch);
}

Could you please give us a hint - what is recommended way to process incoming messages from one Kafka topic and publish it to another topic? Kafka Streams is not an option right now.

Note: Seems like the Consume call is blocking Producer. Consumer config value EnablePartitionEof is FALSE.

mhowlett commented 4 years ago

apologies for the delay, i missed the email notification.

you can specify a timeout argument to the Consume method (but the method will return null in the event of no message and you'll need to handle that) - does this help reduce CPU?

what platform are you on?

saherahwal commented 3 years ago

Hi @mhowlett May I ask does confluent library create one thread per partition as well? If we are consuming from a topic with 10 partitions and say we have only one consumer, is it correct to assume we have 10 threads? or is it 3 threads (we have 3 brokers)? Thank you very much in advance,

mhowlett commented 3 years ago

one thread per broker (3), plus an additional thread, plus a .net managed thread in the case of the producer.

saherahwal commented 3 years ago

Thank you!

On Thu, May 13, 2021 at 2:43 PM Matt Howlett @.***> wrote:

one thread per broker (3), plus an additional thread, plus a .net managed thread in the case of the producer.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/confluentinc/confluent-kafka-dotnet/issues/1261#issuecomment-840852526, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEGN257CGBXQ62DQMQNA5LTNRBZTANCNFSM4MQZS37A .

-- Saher Ahwal

Massachusetts Institute of Technology '13, '14 Department of Electrical Engineering and Computer Science @. @.> | 617 680 4877*

milindl commented 3 months ago

Closing this issue as Matt provided a suggestion and it's been a while since any update. Please reopen if the issue persists in the latest version.