Open sarlindo opened 2 years ago
Hey there,
I am a little late to the party here, so sorry for grave digging. Still, have you found a way to get this up and running consistently? IMHO this is not possible due to the nature of quorums requiring an odd amount of members. We have a similar setup currently.
In your hierarchical example, I noticed that kafka broker 3 and 4 also connect to zookeeper 1,2,3. To simulate the 2 DC setup with 3 zookeepers and 2 brokers per site, shouldn't kafka 3 and 4 connect to it's local zookeepers 4,5 and 6? Also is "rack awareness required for the kafka config?
I am personally trying to setup a 2 DC kafka cluster with a 6 server setup.
My stretch cluster setup:
a) 3 servers at DC1 with 3 zookeepers and 3 brokers
b) Another 3 servers at DC2 with 3 zookeepers and 3 brokers
c) Zookeeper config using all 6 zookeepers across site with hierarchical quorum. Each Sites Brokers connect ONLY to the 3 zookeepers at local site. Example: 3 brokers at DC1, connect only to zookeeper 1,2,3 at DC1 and 3 brokers at DC2, connect only to zookeeper 4,5,6 at DC2.
d) Our replication factor is 6 and min isr is 4 for topics to ensure data is replicated to at least both sites. We don't want to lose any data, that is our requirement. Clients ack is set to "all".
e) each site is configured with rack awareness and unclean leader election set to FALSE.
We are trying to test FULL DC1 outage and when full DC1 outage occurs, we want to ensure that data can still be consumed at DC2 without any data lost. We have automation that does the following upon full DC1 outage:
a) alter all topics to min isr of 1 b) alter zookeeper config to only 3 zookeepers at DC2 and remove any group.
Now when we run tests, we don't get 100% consistent results. Most of the time, this works great and zero data lost. But we find from time to time, the leader is set to "none" and ISR seems messed up as well, when this happens we can't consume any data.
Have you ran into these sorts of issues? Is the config and steps I outlined above correct? Have you experienced these sort of things with 2 DC setup?
kafka-3: build: context: ../ dockerfile: Dockerfile.kafka hostname: kafka-3 depends_on:
zookeeper-3 environment: KAFKA_ZOOKEEPER_CONNECT: "zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181" KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://kafka-3:9092" KAFKA_OFFSETS_TOPIC_NUM_PARTITIONS: "1" KAFKA_BROKER_ID: "3"
kafka-4: build: context: ../ dockerfile: Dockerfile.kafka hostname: kafka-4 depends_on: