aio-libs / aiokafka

asyncio client for kafka
http://aiokafka.readthedocs.io/
Apache License 2.0
1.08k stars 224 forks source link

[QUESTION] idempotent producer and OutOfOrderSequenceNumber problem #1017

Open funkindy opened 1 week ago

funkindy commented 1 week ago

I have two instances of aiokafka producers with enable_idempotence=True. Each instance is producing messages using only send_and_wait method. Once in a while they start to throw OutOfOrderSequenceNumber and producing messages stops until we restart pods.

Before this i see server logs, looks like cluster become unavailable on a short period of time:

Controller 1's connection to broker was unsuccessful (kafka.controller.RequestSendThread) java.io.IOException: Connection to host failed. at org.apache.kafka.clients.NetworkClientUtils.awaitReady(NetworkClientUtils.java:70) at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:298) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:251) at org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:127)"

Connection to node could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)",

After that something reconnects and this error appears:

[ReplicaManager broker=2] Error processing append operation on partition merchants-0 (kafka.server.ReplicaManager) org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 29003 at offset 1641969 in partition merchants-0: 158 (incoming seq. number), 155 (current end sequence number)",

How is it possible that on the client side there is 158 sequence number, which is greater than the state on the broker (155). No messages were delivered.

Maybe someone can explain in details how idempotent producers work in aiokafka and how do they sync their states?

Thank you.