confluentinc / confluent-kafka-python

Confluent's Kafka Python Client
http://docs.confluent.io/current/clients/confluent-kafka-python
Other
123 stars 896 forks source link

Producer gets stuck in init_producer_id loop #1335

Closed rystsov closed 2 years ago

rystsov commented 2 years ago

Description

Observed a pathological behavior which is missing in other clients (java's kafka-clients & franz-go).

With disabled transactions when any client (including confluent_kafka) is about to send init_producer_id it doesn't use txn coordinator, picks a broker randomly and sends a request there. If the request fails resulting in broker returning a retry-able error kafka-clients & franz-go restart the procedure from scratch, pick a random broker again and use it to retry the request while confluent_kafka is reusing the original broker. In case the problem with the broker is permanent kafka-clients & franz-go has a chance to choose a healthy node while confluent_kafka gets stuck.

How to reproduce

I've used Redpanda but based on the source code the same problem affects Kafka too (not sure if Kafka returns retry-able error for init_producer_id but it returns unknown_server_error which confluent_kafka treats the same way and retries). The repro step:

Client logs (docker-rp-4 was isolated):

%4|1651850857.547|GETPID|rdkafka#producer-7| [thrd:main]: Failed to acquire idempotence PID from broker docker-rp-4:9092/1: Broker: Not coordinator: retrying
%4|1651850860.049|GETPID|rdkafka#producer-7| [thrd:main]: Failed to acquire idempotence PID from broker docker-rp-4:9092/1: Broker: Not coordinator: retrying
%4|1651850862.550|GETPID|rdkafka#producer-7| [thrd:main]: Failed to acquire idempotence PID from broker docker-rp-4:9092/1: Broker: Not coordinator: retrying
%4|1651850863.050|GETPID|rdkafka#producer-7| [thrd:main]: Failed to acquire idempotence PID from broker docker-rp-4:9092/1: Broker: Not coordinator: retrying
%4|1651850873.155|GETPID|rdkafka#producer-7| [thrd:main]: Failed to acquire idempotence PID from broker docker-rp-4:9092/1: Broker: Not coordinator: retrying
%4|1651850883.259|GETPID|rdkafka#producer-7| [thrd:main]: Failed to acquire idempotence PID from broker docker-rp-4:9092/1: Broker: Not coordinator: retrying
%4|1651850886.100|GETPID|rdkafka#producer-7| [thrd:main]: Failed to acquire idempotence PID from broker docker-rp-4:9092/1: Broker: Not coordinator: retrying

Checklist

Please provide the following information:

mhowlett commented 2 years ago

@edenhill - seems like something we could improve

edenhill commented 2 years ago

Good find! Created upstream librdkafka issue: https://github.com/edenhill/librdkafka/issues/3848

mhowlett commented 2 years ago

tracking this in the above mentioned librdkafka issue.