banzaicloud / koperator

Oh no! Yet another Apache Kafka operator for Kubernetes
Apache License 2.0
789 stars 198 forks source link

Q: Leader election rebalancing? #239

Closed borisputerka-zz closed 4 years ago

borisputerka-zz commented 4 years ago

Hi I have one question to ask. Yesterday I had few under-replicated partitions. I restarted broker that couldn't replicate and problem was fixed. Later I realized I have not even distribution of leader for partitions across cluster. It's like 130,180,10 leader partition per broker or so. Here is screenshot of what happened before under-replication has occurred. SCREENSHOT

Also how rebalancig works it this case. With previous setup, producers got list of brokers eg., kafka-0,kafka-1,kafka-2 and all 3 brokers had similar usage. Now when connecting producers and consumers to headless-service of kafka, one of the brokers use significantly more page cache than other two, like twice more. ad 2. Out of my OCD just to be sure is it ok to use headless-service as connection point for producer and consumer right?

borisputerka-zz commented 4 years ago

After some investigation I found out that it was cause by huge peak in one of the topics (see image below) and cruise control suddenly started doing his job. Yet I ened up with anothe 10 underreplicated partitions after another peak in topic. graphs

Another question is whether it will come to normal after a while or I need to restart that broker that does not replicate again.

tinyzimmer commented 4 years ago

Out of my OCD just to be sure is it ok to use headless-service as connection point for producer and consumer right?

Yes

I'm not entirely sure about the other questions though, I might leave that for someone else to weigh in on.

borisputerka-zz commented 4 years ago

CC also stuck during this after I restart thath broker. 10126300 [posalExecutor-0] INFO ruisecontrol.executor.Executor - Starting 218 inter-broker partition movements. 10138023 [posalExecutor-0] INFO ruisecontrol.executor.Executor - 20/218 (9.17%) inter-broker partition movements completed. 5/2230 (0.22%) MB have been moved. 10149015 [posalExecutor-0] INFO ruisecontrol.executor.Executor - 40/218 (18.35%) inter-broker partition movements completed. 5/2230 (0.22%) MB have been moved. 10159634 [posalExecutor-0] INFO ruisecontrol.executor.Executor - 60/218 (27.52%) inter-broker partition movements completed. 5/2230 (0.22%) MB have been moved. 10170108 [posalExecutor-0] INFO ruisecontrol.executor.Executor - 80/218 (36.70%) inter-broker partition movements completed. 5/2230 (0.22%) MB have been moved. 10180261 [posalExecutor-0] INFO ruisecontrol.executor.Executor - 90/218 (41.28%) inter-broker partition movements completed. 23/2230 (1.03%) MB have been moved. 10190403 [posalExecutor-0] INFO ruisecontrol.executor.Executor - 101/218 (46.33%) inter-broker partition movements completed. 24/2230 (1.08%) MB have been moved. 10200614 [posalExecutor-0] INFO ruisecontrol.executor.Executor - 111/218 (50.92%) inter-broker partition movements completed. 36/2230 (1.61%) MB have been moved. 10210714 [posalExecutor-0] INFO ruisecontrol.executor.Executor - 121/218 (55.50%) inter-broker partition movements completed. 41/2230 (1.84%) MB have been moved. 10220902 [posalExecutor-0] INFO ruisecontrol.executor.Executor - 131/218 (60.09%) inter-broker partition movements completed. 44/2230 (1.97%) MB have been moved. 10231126 [posalExecutor-0] INFO ruisecontrol.executor.Executor - 141/218 (64.68%) inter-broker partition movements completed. 61/2230 (2.74%) MB have been moved. 10241220 [posalExecutor-0] INFO ruisecontrol.executor.Executor - 149/218 (68.35%) inter-broker partition movements completed. 61/2230 (2.74%) MB have been moved. 10271325 [posalExecutor-0] INFO ruisecontrol.executor.Executor - 150/218 (68.81%) inter-broker partition movements completed. 1085/2230 (48.65%) MB have been moved. 10272580 [omalyDetector-2] INFO .detector.AnomalyDetectorUtils - Skipping anomaly detection because the executor is in INTER_BROKER_REPLICA_MOVEMENT_TASK_IN_PROGRESS state. 10274240 [omalyDetector-2] INFO .detector.AnomalyDetectorUtils - Skipping anomaly detection because the executor is in INTER_BROKER_REPLICA_MOVEMENT_TASK_IN_PROGRESS state. 10282584 [omalyDetector-1] INFO .detector.AnomalyDetectorUtils - Skipping anomaly detection because the executor is in INTER_BROKER_REPLICA_MOVEMENT_TASK_IN_PROGRESS state. 10292579 [omalyDetector-2] INFO .detector.AnomalyDetectorUtils - Skipping anomaly detection because the executor is in INTER_BROKER_REPLICA_MOVEMENT_TASK_IN_PROGRESS state. 10302579 [omalyDetector-3] INFO .detector.AnomalyDetectorUtils - Skipping anomaly detection because the executor is in INTER_BROKER_REPLICA_MOVEMENT_TASK_IN_PROGRESS state. 10312579 [omalyDetector-4] INFO .detector.AnomalyDetectorUtils - Skipping anomaly detection because the executor is in INTER_BROKER_REPLICA_MOVEMENT_TASK_IN_PROGRESS state. 10322583 [omalyDetector-1] INFO .detector.AnomalyDetectorUtils - Skipping anomaly detection because the executor is in INTER_BROKER_REPLICA_MOVEMENT_TASK_IN_PROGRESS state. 10332580 [omalyDetector-3] INFO .detector.AnomalyDetectorUtils - Skipping anomaly detection because the executor is in INTER_BROKER_REPLICA_MOVEMENT_TASK_IN_PROGRESS state. 10342580 [omalyDetector-2] INFO .detector.AnomalyDetectorUtils - Skipping anomaly detection because the executor is in INTER_BROKER_REPLICA_MOVEMENT_TASK_IN_PROGRESS state. 10344798 [lingScheduler-0] INFO trol.monitor.task.SamplingTask - Skip sampling because the load monitor is in PAUSED state due to Paused-By-Cruise-Control-Before-Starting-Execution (Date: 2019-12-26_23:37:12 UTC).. 10352582 [omalyDetector-4] INFO .detector.AnomalyDetectorUtils - Skipping anomaly detection because the executor is in INTER_BROKER_REPLICA_MOVEMENT_TASK_IN_PROGRESS state. 10362586 [omalyDetector-1] INFO .detector.AnomalyDetectorUtils - Skipping anomaly detection because the executor is in INTER_BROKER_REPLICA_MOVEMENT_TASK_IN_PROGRESS state. 10372580 [omalyDetector-3] INFO .detector.AnomalyDetectorUtils - Skipping anomaly detection because the executor is in INTER_BROKER_REPLICA_MOVEMENT_TASK_IN_PROGRESS state. 10382583 [omalyDetector-2] INFO .detector.AnomalyDetectorUtils - Skipping anomaly detection because the executor is in INTER_BROKER_REPLICA_MOVEMENT_TASK_IN_PROGRESS state. 10392580 [omalyDetector-1] INFO .detector.AnomalyDetectorUtils - Skipping anomaly detection because the executor is in INTER_BROKER_REPLICA_MOVEMENT_TASK_IN_PROGRESS state. 10402579 [omalyDetector-4] INFO .detector.AnomalyDetectorUtils - Skipping anomaly detection because the executor is in INTER_BROKER_REPLICA_MOVEMENT_TASK_IN_PROGRESS state. 10412579 [omalyDetector-3] INFO .detector.AnomalyDetectorUtils - Skipping anomaly detection because the executor is in INTER_BROKER_REPLICA_MOVEMENT_TASK_IN_PROGRESS state.