Aiven-Open / karapace

Karapace - Your Apache Kafka® essentials in one tool
https://karapace.io
Apache License 2.0
450 stars 68 forks source link

feature: master coordinator with aiokafka #880

Closed jjaakola-aiven closed 3 months ago

jjaakola-aiven commented 3 months ago

About this change - What it does

Change the coordinator to use aiokafka.

Why this way

In cases when Kafka brokers are replaced with new brokers on new servers the kafka-python group coordinator can get stuck in a perpetual DNS failure loop. The exact thread getting stuck is the heartbeat thread of Kafka client. This condition requires Karapace restart. See also https://github.com/wbarnha/kafka-python-ng/issues/36.

Work to remove the kafka-python as a dependency is also progressing and this is one more step and provides async capable functionality. The rdkafka has group coordinator but it is not exposed from confluent-kafka Python binding and cannot be used for Karapace primary selection coordinator at this time. The work required to have the group coordination exposed from rdkafka is a future item to investigate.

The primary coordinator is adapted from aiokafka group coordinator for Karapace. Required changes include removing of subscription handling and partition assignors, adding Karapace specific metadata to group and selecting the primary instance. Note that Karapace can join as follower to the group but selected as primary instance.

The implementation removes the primary coordinator thread and primary coordinator is run in the application event loop.