confluentinc / confluent-kafka-python

Confluent's Kafka Python Client
http://docs.confluent.io/current/clients/confluent-kafka-python
Other
84 stars 892 forks source link

Kafka producer does not report _ISR_INSUFF as error code when ISR count is less; instead reports _MSG_TIMED_OUT #1711

Open prashantochatterjee opened 7 months ago

prashantochatterjee commented 7 months ago

Description

The issue is seen when a node is brought down in a 3-node cluster wherein one of the topic partitions had an ISR count of 1 against a min-ISR setting of 2. The Kafka producer client is configured to report errors using error_cb callback setting. It is observed that the error code is reported as _MSG_TIMED_OUT instead of a more intuitive _ISR_INSUFF leading me to believe that a message was being sent to the node that was down.

How to reproduce

Checklist

Please provide the following information:

pranavrth commented 4 months ago

The error _ISR_INSUFF is internally retried in the producer to produce the messages till message.timeout.ms is reached after which _MSG_TIMED_OUT error is thrown. This is the reason, _MSG_TIMED_OUT is thrown and not _ISR_INSUFF. This is by design like this.

prashantochatterjee commented 4 months ago

@pranavrth Thanks for your explanation. Is there any way the _ISR_INSUFF code can take precedence over timeout? What's happening is that the real issue then escapes attention.

pranavrth commented 4 months ago

This is surely an improvement which we will take later.