confluentinc / confluent-kafka-dotnet

Confluent's Apache Kafka .NET client
https://github.com/confluentinc/confluent-kafka-dotnet/wiki
Apache License 2.0
80 stars 866 forks source link

Consumer slow after initial read and how to determine if consumer had read latest messages #1522

Closed tech7857 closed 3 years ago

tech7857 commented 3 years ago

Hi

our consumer gets pretty slow after reading initial data seems like after a day or so. When we restart it again starts to work fine for some time but it again it gets slow. We are doing manual commit after reading each message. Reading through blogs it seems like manual commits after each message read can cause this. Is this true? What is the best way to commit. If we use storeoffsets , do we have to care about handling duplicate messages in case consumer fails in between.

also we have a use case to determine if consumer has read all messages in the topic with multiple partition so that another process can be triggered, what is the way to determine this? Is EOF partition property helps in determining this?

any help will be appreciated.

thanks

mhowlett commented 3 years ago

If we use storeoffsets , do we have to care about handling duplicate messages in case consumer fails in between.

yes, but you need to with sync committing as well, though if sync committing, you will at most double process a single message in the case of an error. for exactly once processing you need to use transactions and non-Kafka side effects need to be idempotent.

determine if consumer has read all messages in the topic with multiple partition

probably partition eof, though yes, you need to do the bookeeping. also, i can't recall the behavior of the edge case of no messages in a partition, so you should test that edge case. you could also do a calculation with Committed and Position after having not consumed a message after some time period.

tech7857 commented 3 years ago

Thanks @mhowlett for the reply. Appreciate it. Can you please let us know on the below. Not sure why it gets slow in consumption. Why does manual commit will cause the consumer to go slow after a day or so.

our consumer gets pretty slow after reading initial data seems like after a day or so. When we restart it again starts to work fine for some time but it again it gets slow. We are doing manual commit after reading each message. Reading through blogs it seems like manual commits after each message read can cause this. Is this true?

mhowlett commented 3 years ago

i have no idea, it shouldn't. feel free to past debug logs of when it's going slow.