This project is an attempt to simulate network partition within a Kafka cluster and observe the behavior of the cluster. The purpose is to evaluate the durability guarantees provided by a Kafka cluster in case of unreliable network. Those scenarios try to emulate a Kafka stretch cluster over 2 datacenters.
3 Kafka brokers: kafka-1, kafka-2 and kafka-3 and one zookeeper.
Current leader is on kafka-1 then kafka-1 blocks all incoming messages from kafka-2, kafka-3 and zookeeper
4 Kafka brokers: kafka-1, kafka-2, kafka-3 and kafka-4 and one zookeeper.
Current leader is currently on kafka-1 then a network partition is simulated:
4 Kafka brokers: kafka-1, kafka-2, kafka-3 and kafka-4 and 3 zookeeper.
For some reasons, you decide to rebuild the quorum of zookeeper (e.g. you lost a rack or a DC).
There is no guarantee, after rebuilding a quorum, that the nodes have all the required information.
4 Kafka brokers: kafka-1, kafka-2, kafka-3 and kafka-4 and 3 zookeeper.
Simulate a complete network outage between each and every component.
When the network comes back the quorum is reformed, and the cluster is healthy.
Network setup:
We simulate a DC network split.
When the network comes back the quorum is reformed, and the cluster is healthy.
Network setup:
We simulate the following connectivity loss:
All other connections are still up. All partitions where Kafka-3 is the leader are unavailable. If we stop Kafka-3, they are still unavailable as unclean leader election is not enabled and Kafka-3 is the only broker in ISR.