ganler / ResearchReading

General system research material (not limited to paper) reading notes.
GNU General Public License v3.0
20 stars 1 forks source link

Kafka: A Modern Distributed System #1

Closed ganler closed 4 years ago

ganler commented 4 years ago

I used to use Kafka MQ in my ByteDance internship, but I only got some rudimentary ideas about concepts like "topic", "broker", things like that.

Official Introduction by Tim: https://kafka.apache.org/intro QCon by Tim: https://www.youtube.com/watch?v=Ea3aoACnbEk

ganler commented 4 years ago

Big Idea

Before Kafka: Think as "Things"

Information is usually stored in database only. (we store things with states in the database)

Kafka: "Think as "Events" instead"

Kafka encourages people to think about EVENTs first, and THINGs second.

Events happened: written to the LOG.

Kafka manages the states of events using a term called TOPIC, which is an ordered collection of events stored in the DURABLE way.

Here "durable" means things are store in a distributed fashion, with replicas.

"Back to when DBs rule the world", the trend is to build one large program using a big database. Now the trend is to write lots of small logs to small programs that are small enough for users to think. Each of the small programs talks to each other to using messages from a topic(operations are like: consume, produce).

ganler commented 4 years ago

Kafka Basics

Not a queue, but a log(stored persistently).

Events are modeled in K/V pairs, which live in the topic for a given elapsed time(a sys config parameter, say 7 days).

For scalability and fault tolerance, the log is partitioned on many servers. (Say a topic is made of 4 partitions(in the file form)). And the "servers" here are "brokers".

Problems 1, which partition should I select?

image

The answer is via consistent hashing.

Since we got a hashed value x, x % N(= 4 here) is the destination.

Producer => Clients Consumer => Topics

Replication

For fault tolerance.

Say we'll replica partitions using a replication factor: say 3. (3 replicas)

image

Leader: One of the replicas is the leader. (If u read, u read the leader first) Followers: Non-leaders.

If the leader died, we'll elect a new leader.

Consumer Groups

A group of consumer clients.

The controller

One broker in Kafka's cluster is elected as a controller.