In cases when Kafka brokers are replaced with new brokers on new servers the kafka-python group coordinator can get stuck in a perpetual DNS failure loop. The exact thread getting stuck is the heartbeat thread of Kafka client. This condition requires Karapace restart. See also https://github.com/wbarnha/kafka-python-ng/issues/36.
Work to remove the kafka-python as a dependency is also progressing and this is one more step and provides async capable functionality. The rdkafka has group coordinator but it is not exposed from confluent-kafka Python binding and cannot be used for Karapace primary selection coordinator at this time. The work required to have the group coordination exposed from rdkafka is a future item to investigate.
The primary coordinator is adapted from aiokafka group coordinator for Karapace. Required changes include removing of subscription handling and partition assignors, adding Karapace specific metadata to group and selecting the primary instance. Note that Karapace can join as follower to the group but selected as primary instance.
The implementation removes the primary coordinator thread and primary coordinator is run in the application event loop.
About this change - What it does
Change the coordinator to use
aiokafka
.Why this way
In cases when Kafka brokers are replaced with new brokers on new servers the
kafka-python
group coordinator can get stuck in a perpetual DNS failure loop. The exact thread getting stuck is the heartbeat thread of Kafka client. This condition requires Karapace restart. See also https://github.com/wbarnha/kafka-python-ng/issues/36.Work to remove the
kafka-python
as a dependency is also progressing and this is one more step and provides async capable functionality. Therdkafka
has group coordinator but it is not exposed fromconfluent-kafka
Python binding and cannot be used for Karapace primary selection coordinator at this time. The work required to have the group coordination exposed fromrdkafka
is a future item to investigate.The primary coordinator is adapted from
aiokafka
group coordinator for Karapace. Required changes include removing of subscription handling and partition assignors, adding Karapace specific metadata to group and selecting the primary instance. Note that Karapace can join as follower to the group but selected as primary instance.The implementation removes the primary coordinator thread and primary coordinator is run in the application event loop.