LGouellec / kafka-streams-dotnet

.NET Stream Processing Library for Apache Kafka 🚀
https://lgouellec.github.io/kafka-streams-dotnet/
MIT License
455 stars 74 forks source link

Continuously Rising CPU usage by Kafka streams #50

Closed ayush113 closed 1 year ago

ayush113 commented 3 years ago

This issues is related to another one raised earlier on this package: https://github.com/LGouellec/kafka-streams-dotnet/issues/43

We are making use of kafka-streams-dotnet library in a dotnet core microservices based architecture. We have noticed a peculiar behaviour where our pods running streamer applications always end up consuming 100% of CPU resources allocated to them, sometimes even when they are idling and no processing is happening.

Our question here is whether this is the intended behaviour with the streams implementation, since the stream threads are constantly polling the source topics. If not could you help us with the configurations that we have to use to optimise CPU usage.

The streams topology that we are using is :

Streamiz.kafka.net version being used : 1.1.3

Operating system : Linux (Kubernetes node)

var builder = new StreamBuilder();
            builder.Stream<string, string, StringSerDes, StringSerDes>(_streamOptions.Value.SourceTopic, _timestampExtractor)
                .MapValues((key, value) => _messageProcessor.ProcessMessage(value))
                .Filter((key, value) => CheckThis(key,value))
                .To(_streamOptions.Value.SinkTopic);
ayush113 commented 3 years ago

@LGouellec , Would be extremely grateful for any help or insight here.

Thank you

LGouellec commented 3 years ago

Hi @ayush113,

Can you send your stream config ? Your stream app start with just one thread ? How many messages process by second ?

In your microservice, you have just a streamiz app, or you use another middleware, library, etc ... Can you dump your pod print memory when you have this problem ?

Regards,

ayush113 commented 3 years ago

Hi @LGouellec ,

Can you send your stream config ? `

var streamConfig = new StreamConfig<StringSerDes, StringSerDes>

        {
            ApplicationId = _streamOptions.Value.ApplicationId,
            BootstrapServers = _streamOptions.Value.Broker,
            AutoOffsetReset = AutoOffsetReset.Earliest,
            SaslMechanism = SaslMechanism.Plain,
            SecurityProtocol = SecurityProtocol.SaslSsl,
            SaslUsername = _streamOptions.Value.SaslUsername,
            SaslPassword = _streamOptions.Value.SaslPassword,
            NumStreamThreads = 10,
            RequestTimeoutMs = 60000,
            SessionTimeoutMs = 180000,
            MetadataMaxAgeMs = 180000,
            SocketKeepaliveEnable = true,
            PollMs = 100
        };

`

We are using AT LEAST ONCE configuration for processing.

Your stream app start with just one thread ? We have observed this behaviour on starting with one thread, as well as while using 10 threads.

How many messages process by second ? The CPU consumption is always near 100% even with 0 messages processed, strangely when I introduce some load (100 messages/second) the application is still able to process and sink the messages.

In your microservice, you have just a streamiz app, or you use another middleware, library, etc ... This particular microservice just uses streamiz, there's some custom logic in the MapValue processor, we have profiled that method and not noticed any unusual CPU consumption there.

Pod Log when the problem occurs: Since this occurs both when the pod is processing messages and when it's idle we see DEBUG logs reporting Committing all active tasks in X ms.

At other times we see logs from process steps. (when messages are flowing in)

Thank you

ayush113 commented 3 years ago

Additionally are there any minimum CPU requirements that you think applications built on top of this package should have ?

fglaeser commented 3 years ago

@ayush113, We are having some cpu problem also in our Openshift cluster with netcore 3.1 apps, (No related with Streamiz), we changed the CG to workstation with good results, maybe this could help you.

ayush113 commented 3 years ago

@fglaeser , Thanks for the response, although in our case we aren't observing this for other microservices running in our OpenShift cluster, that are also based on netcore 3.1, only observing this with streamiz at the moment.

Thank you

LGouellec commented 3 years ago
DEBUG logs reporting Committing all active tasks in X ms.

is cleary normal. By default, Streamiz commit your offsets every X seconds. You can change using :

config.CommitIntervalMs

Default is 30 seconds.

If no messages are processed, this log appears but none offset are commiting.

At moment, I haven't idea.

Just a little question, if you decrease your thread number in your app, you have already this problem or not ? Doesn't if messages or not.

ayush113 commented 3 years ago

I have observed that even for 1 thread, the CPU usage is very high , in terms of cores, it goes more than 1 full core allotted very easily, when I increase the number of thread, the cores used increases, but not by much ( From 1.01 core for 1 thread to 1.11 core for 10 threads).

Due to this I was wondering if we need to allot some minimum resources like the package needs at least 2 cores or some number like that ?

Thank you

LGouellec commented 3 years ago

Streamiz package use Confluent Kafka Dotnet client. This client is asynchronous, so you have a background thread which poll records in your kafka Cluster. Before increase your cores number, if you run your application on your laptop, or other hardware, you have also this problem ? Because when I run your topology on my laptop, my CPU for this process is 0 when no records

ayush113 commented 3 years ago

What I observe even for 0 records processed is as follows: image

Here no messages are being processed but still usage is at 20% on Windows laptop.

I had noticed that there was a fix for High CPU usage in version 1.1.2 , so we upgraded to 1.1.3 and we are still seeing this behaviour, high usage even when idle (No message processed).

LGouellec commented 3 years ago

Ohh very strange. Your application is open source or not ? Yeah normaly in 1.1.3, you have a fix around High CPU Usage here.

Could you create a dump process on windows laptop with high CPU Usage ? I understand if not. It's just for diag.

ayush113 commented 3 years ago

Our application is closed source. I will try to create a dump if possible and share it with you here.

Thank you

LGouellec commented 3 years ago

Thank you

ayush113 commented 3 years ago

Hi @LGouellec , I have mailed the dump file to the email ID mentioned on your Github profile. Appreciate your help with this

Thank you

LGouellec commented 3 years ago

Ok perfect !

As I have diagnostic, I complete this issue.

Cdt,

LGouellec commented 3 years ago

Is it possible to send me the same dump but in Debug compilation mode ? Because you are in Release mode, so I can't inspect your dump in depth.

Just a little question to be sure, when input topic add messages , your stream application process message ? Correct ? Or not ? Because in your dump, I have many threads for librdkafka (Confluent Kafka dotnet use librdkafka) : image

So maybe your problem raise here.

Could you set this configuration in your stream app ?

config.Debug = "broker,topic,metadata,fetch";

and send me with dump in Debug mode, the logs please ?

ayush113 commented 3 years ago

Hi @LGouellec

I will try to create a more detailed dump in an idle scenario, currently facing some problems with debug settings, will share the dump file soon.

Thank you

LGouellec commented 1 year ago

Close this issue due to a lack of time response.