We're using a KafkaConsumer object (not Legacy consumer) which is consuming from a single topic. The behaviour I'm about to describe happens regardless of whether the KafkaConsumer is consuming all partitions, a subset of them or a single partition.
We're registering an average delay of 950ms between the moment a new record is uploaded into the Kafka topic and the moment our consumer actually manages to consume it.
This is an example of the events we are registering
Here we have the first highlighted trace indicating that we have received the END_OF_PARTITION errorcode from Kafka for partition 5. The HighOffset is 3193270.
Later on, at the second highlighted trace, we receive the first record added to partition 5. We compute the epoch difference between the RdkafkaMessage timestamp and "now". Considering our KafkaConsumer was ready to consume any other message (even the other partitions had reached the END_OF_PARTITION), one would expect the difference to be very very small (maybe 200 ms), instead we can see here it gets to 859 ms. Sometimes it even gets as big as 1500ms.
My team and I have been trying to understand why we're getting such a considerable delay for over 2 weeks and we don't understand what could be the cause.
Description
We're using a KafkaConsumer object (not Legacy consumer) which is consuming from a single topic. The behaviour I'm about to describe happens regardless of whether the KafkaConsumer is consuming all partitions, a subset of them or a single partition.
We're registering an average delay of 950ms between the moment a new record is uploaded into the Kafka topic and the moment our consumer actually manages to consume it.
This is an example of the events we are registering
Here we have the first highlighted trace indicating that we have received the END_OF_PARTITION errorcode from Kafka for partition 5. The HighOffset is 3193270. Later on, at the second highlighted trace, we receive the first record added to partition 5. We compute the epoch difference between the RdkafkaMessage timestamp and "now". Considering our KafkaConsumer was ready to consume any other message (even the other partitions had reached the END_OF_PARTITION), one would expect the difference to be very very small (maybe 200 ms), instead we can see here it gets to 859 ms. Sometimes it even gets as big as 1500ms.
My team and I have been trying to understand why we're getting such a considerable delay for over 2 weeks and we don't understand what could be the cause.
Checklist