When running multiple replicas of Schema registry for HA, 1 pod is elected as leader while other are workers.
When we are deleting leader pod then it is electing one of the other pod as a new leader.
But when new pod scales up , again new pod is elected as leader which causes few seconds of downtime and results in failure of schemas creation while rebalancing. Seems rebalancing is happenning twice.
Expected behaviour :
When running in HA , i.e, multiple replicas of schema registry. Deleting leader pod should elect new leader only once and process further requests without switching leader untill it's deleted again.
Steps to Reproduce :
Run multiple replicas of Schema registry using Deployent/Statefulset.
Delete leader pod and immediately create multiple schemas.
Failure is observed for some of the schemas creation.
Logs
kubectl get po -n kafka -o wide
NAME READY STATUS IP
cp-schema-registry-0 1/1 Running 10.0.7.153
cp-schema-registry-1 1/1 Running 10.0.8.175
cp-schema-registry-2 1/1 Running 10.0.32.182 (Current Leader)
kubectl logs cp-schema-registry-0 -n kafka | grep leader
[2023-01-12 06:41:55,177] INFO Finished rebalance with leader election result: Assignment{version=1, error=0, leader='sr-1-6f6c7d7c-e739-400b-9904-620bc282350c', leaderIdentity=version=1,host=10.0.32.182,port=8081,scheme=http,leaderEligibility=true} (io.confluent.kafka.schemaregistry.leaderelector.kafka.KafkaGroupLeaderElector)
Deleted Leader Pod, New pod scales up with ip (10.0.36.86)
kubectl get po -n kafka -o wide
NAME READY STATUS RESTARTS AGE IP
cp-schema-registry-0 1/1 Running 0 21h 10.0.7.153
cp-schema-registry-1 1/1 Running 0 6m39s 10.0.8.175
cp-schema-registry-2 1/1 Running 0 36s 10.0.36.86
Logs for leader election
[2023-01-12 06:40:55,135] INFO Rebalance started (io.confluent.kafka.schemaregistry.leaderelector.kafka.KafkaGroupLeaderElector)
[2023-01-12 06:41:55,177] INFO Finished rebalance with leader election result: Assignment{version=1, error=0, leader='sr-1-6f6c7d7c-e739-400b-9904-620bc282350c', leaderIdentity=version=1,host=10.0.32.182,port=8081,scheme=http,leaderEligibility=true} (io.confluent.kafka.schemaregistry.leaderelector.kafka.KafkaGroupLeaderElector)
.....
..
.
Caused by: io.confluent.kafka.schemaregistry.exceptions.UnknownLeaderException: Register schema request failed since leader is unknown
....
..
###### form below you can see existing pod is elected as leader (10.0.7.153- cp-schema-registry-0)
io.confluent.kafka.schemaregistry.rest.exceptions.RestRequestForwardingException: Error while forwarding register schema request to the leader
[2023-01-12 06:44:25,188] INFO Rebalance started (io.confluent.kafka.schemaregistry.leaderelector.kafka.KafkaGroupLeaderElector)
[2023-01-12 06:44:25,216] INFO Finished rebalance with leader election result: Assignment{version=1, error=0, leader='sr-1-a5c21494-c85b-4296-9471-7e7c523c3178', leaderIdentity=version=1,host=10.0.7.153,port=8081,scheme=http,leaderEligibility=true} (io.confluent.kafka.schemaregistry.leaderelector.kafka.KafkaGroupLeaderElector)
....
..
.
##### Once new pod is up and running again rebalancing happened (10.0.36.86- cp-schema-registry-2) is elected as leader
[2023-01-12 06:44:46,213] INFO Rebalance started (io.confluent.kafka.schemaregistry.leaderelector.kafka.KafkaGroupLeaderElector)
[2023-01-12 06:44:46,218] INFO Finished rebalance with leader election result: Assignment{version=1, error=0, leader='sr-1-bf1e2a4d-a3f0-4817-a856-87cd1aaab60d', leaderIdentity=version=1,host=10.0.36.86,port=8081,scheme=http,leaderEligibility=true} (io.confluent.kafka.schemaregistry.leaderelector.kafka.KafkaGroupLeaderElector)
...
..
.
Sometime leader is not even gets elected until new pod is up and running.
Problem Description :
When running multiple replicas of Schema registry for HA, 1 pod is elected as leader while other are workers. When we are deleting leader pod then it is electing one of the other pod as a new leader. But when new pod scales up , again new pod is elected as leader which causes few seconds of downtime and results in failure of schemas creation while rebalancing. Seems rebalancing is happenning twice.
Expected behaviour :
When running in HA , i.e, multiple replicas of schema registry. Deleting leader pod should elect new leader only once and process further requests without switching leader untill it's deleted again.
Steps to Reproduce :
Logs
Deleted Leader Pod, New pod scales up with ip (10.0.36.86)
Logs for leader election
Sometime leader is not even gets elected until new pod is up and running.
Additional Information :
Image Used : 6.2.5
Kubernetes cluster : AWS EKS
Kubernetes version :