aio-libs / aiokafka

asyncio client for kafka
http://aiokafka.readthedocs.io/
Apache License 2.0
1.08k stars 224 forks source link

Can't Recover From NotLeaderForPartitionError #1018

Open oferda4 opened 1 week ago

oferda4 commented 1 week ago

Describe the bug Hi all, We use Aivens's kafka servers and aiokafka for the client side.

When getting NotLeaderForPartitionError error (documented here), although the error is define invalid_metadata = True and it indeed rerequest the metadata (as it should), the error is keep raising.

We also tried to stop the current producer and create a new object every time this exception is raised. Doing so, the new producer seems to work properly, however the error is keep being printed - Got error produce response on topic-partition TopicPartition(topic='XXXXX', partition=X), retrying. Error: <class 'aiokafka.errors.NotLeaderForPartitionError'>. Also according to the network usage the metadata is keep being asked. The error is printed until the process is shutdown (which can take days).

According to the server's logs, it behaves properly - changing leader only once in a while (when those errors are starting) but not keep changing it.

Expected behaviour

Environment (please complete the following information):

Reproducible example The producer is created with the following way:

aiokafka.AIOKafkaProducer(
    "bootstrap_servers": "my_server:9092",
    "request_timeout_ms": 60000,
    "linger_ms": 0,
    "compression_type": None,
    "max_batch_size": 16000,
    "max_request_size": humanfriendly.parse_size("20MiB"),
    "acks": 1,
)

Actually reproducing is difficult, as you must have a setup with Aiven, then you need to make it change the leader and also that not always causes the issue.