confluentinc / confluent-kafka-dotnet

Confluent's Apache Kafka .NET client
https://github.com/confluentinc/confluent-kafka-dotnet/wiki
Apache License 2.0
90 stars 869 forks source link

Intermittent Local_MsgTimedOut errors #755

Open mindwerxkmg opened 5 years ago

mindwerxkmg commented 5 years ago

Description

When running for prolonged periods of time (days/weeks) our Kafka producer begins throwing a Local_MsgTimedOut error. Sometimes the issue goes away on its own. Other times we have to reboot the machine running the Kafka client in order to get it to start publishing and occasionally that does not work either.

How to reproduce

We cannot reproduce on purpose. Client connection issue occurs in stage and prod environments.

Checklist

Please provide the following information:

mhowlett commented 5 years ago

this means librdkafka wasn't able to deliver the message according to the configured message.timeout.ms and message.send.max.retries values. This is unlikely to be caused by a problem with the client - it's probably broker or network related. I'd start by looking at the broker metrics for any problems with the cluster (assuming you have this set up - if not, start with that), or client logs (i think debug='broker' should be enough, but I might be forgetting something).

anchitj commented 5 months ago

@mindwerxkmg Is this still an issue? Can you provide debug logs?

ksdvishnukumar commented 4 months ago

@mhowlett and @mindwerxkmg may I know what is the fix for this?