Distributed Systems Primer - Section 4

How does MapReduce help us to scale big computations?

With MapReduce, we move processing to the data instead of moving data to the processing unit. This increases scalability and reliability

Give two reasons why data processing pipelines can be fragile Uneven Work Distribution can create problems such as overloaded nodes or nodes that take too long to complete their part of the job. "Thundering Herd" Problems - too many workers starting at the same time can overwhelmn the server.

How can results of tasks be communicated back to users in a queue-based system? Create a response queue that will be consumed by the user

What are the components of the Kafka architecture? ZooKeeper - acts as a config repository and ensures that there's a leader elected at all times. If the leader is down, zookeeper will elect one of the brokers as a new leader (controller) Brokers - these handle all of the IO operations, appending to log files Producers - these are client applications that produce messages Consumers - these are client applications that consume messages

How are topics different from partitions? Topics are event streams that can contain multiple partitions Partitions are the smallest chunk of data in kafka, which are append-only logs of messages/events

berkeli / immersive-go

Distributed Systems Primer - Section 4 #56