Distributed Systems: State

Sections

Covers the role of caches, cache loading strategies, hazards, invalidation, immutable data, and scaling through replication and sharding.

Discusses the reasons for using distributed databases, the CAP Theorem, leader/follower replication, synchronous and asynchronous replication, and the importance of transaction IDs and logical timestamps.

Explains the challenges of using system time for ordering in distributed systems and the role of logical timestamps.

Covers database sharding considerations, orchestration tools like Vitess, and application changes needed when moving to a sharded datastore.

Provides a link to a hands-on project related to the concepts discussed in the section.

Lists several academic papers and articles that delve deeper into real-world examples of distributed data stores and their architectures, as well as an analysis of a notable distributed system failure.

The difference between stateful and stateless services
The role and benefits of caching in distributed systems
Strategies for loading data into caches (lazy loading, write through)
Reasons for using a separate cache service rather than in-application caching
Potential hazards of using cache services and how to mitigate them
Cache invalidation strategies and challenges
The concept of immutable data and its benefits for caching
Scaling caches through replication and sharding
Consistent hashing and its importance in sharding data
Reasons for using distributed databases (reliability, capacity)
The CAP Theorem and its implications for distributed data stores
Leader/follower (primary/secondary) database replication
Synchronous, asynchronous, and semisynchronous replication
The role of transaction IDs and logical timestamps in maintaining data consistency
Challenges with using system time for ordering in distributed systems
Database sharding considerations and orchestration tools like Vitess
Application changes needed when moving to a sharded datastore
Real-world examples of distributed data stores (Amazon DynamoDB, Google File System, Bigtable)
Analyzing distributed system failures, such as the 2017 Amazon S3 outage