eduaardofranco / github-blog

0 stars 0 forks source link

How Discord stores trillions of messages #2

Open eduaardofranco opened 8 months ago

eduaardofranco commented 8 months ago

The Discord messaging service was originally launched in May 2015. Just a year and a half later, it had already reached an impressive milestone of 25 million users. Now, completing its 8th year, it has become a central hub for communications with 150 million active users monthly, distributed across 19 million chat rooms, engaging in about 4 billion minutes of conversations daily and exchanging trillions of messages.

Impressive numbers demand impressive backend solutions. In 2017, Discord adopted the Cassandra database after starting its operations with MongoDB. The migration aimed to use a solution that was scalable, fault-tolerant, and relatively low-maintenance, given the platform's rapid growth. As the message volume multiplied by a thousand over the years, Cassandra needed replacement, prompting a new migration.

Issues with Cassandra surfaced as the number of nodes increased from 12 in 2017 to a staggering 177 in early 2022, handling trillions of messages. Latency became unpredictable, and maintenance operations costs soared. Another issue arose: reads were more expensive than writes. Writes were appended to a confirmation log and written to an in-memory structure called memtable, later flushed to disk. Reads, on the other hand, needed to query the memtable and potentially multiple SSTables (on-disk files), a more time-consuming process. Many simultaneous reads during user interactions could overload an active partition, causing a cascading effect.

A cascade would follow: an active partition often impacted latency across the entire database cluster. A specific channel and bucket received significant traffic, and node latency increased as it struggled to keep up. This affected other queries made to that node, as it couldn't handle the demand. Thus, all queries to nodes serving the active partition experienced increased latency, resulting in a larger impact on the end user.

Maintenance tasks in the cluster also posed challenges for administrators, with delays in compactions (when Cassandra combined SSTables on disk to improve read performance), making reads slower and causing cascading issues when a node attempted compaction.

The problem multiplied as the message cluster wasn't the only Cassandra database within Discord; several other clusters faced similar issues.

A New Architecture

Administrators considered ScyllaDB, a C++-written database compatible with Cassandra, as a viable alternative. While they had explored this solution during the MongoDB to Cassandra migration, they were now more willing to consider ScyllaDB due to its potential for better performance, faster repairs, robust workload isolation, and other advantages.

A notable advantage of ScyllaDB was the absence of a garbage collector, as it was written in C++ instead of Java. The Discord team had faced various issues with Cassandra's garbage collector, from latency-impacting pauses to prolonged consecutive pauses. In some cases, an operator had to manually restart a specific node and monitor it until recovery.

After experimenting with ScyllaDB and observing improvements, the decision was made to migrate all Discord databases to it. By 2020, all databases, except one, had been migrated to ScyllaDB. The remaining database was cassandra-messages, immense with trillions of messages and nearly 200 nodes.

Before tackling the challenge of migrating this database, the admin team decided to familiarize themselves with ScyllaDB in a production environment by working with other databases to understand their limitations and benefits. Additionally, they focused on enhancing ScyllaDB performance for other Discord use cases.

During testing, it was found that the performance of reverse queries did not meet the platform's needs. The ScyllaDB team prioritized improvements and implemented efficient reverse queries, eliminating the last obstacle to the database migration.

Rust Enters the Scene

One issue in the Cassandra implementation was hot partitions, where high traffic to a specific partition resulted in uncontrolled concurrency, leading to a progressive increase in query latency. It became necessary to develop intermediate data services that would act between the API monolith and the database clusters. This is where the Rust programming language came into play for writing these services. Developers aimed to leverage its speed (comparable to C/C++) and focus on security.

Another key advantage of Rust is its native support for concurrency, making it easier to write safe concurrent code. The Rust ecosystem, particularly the Tokio library, serves as a solid foundation for developing asynchronous I/O systems. Additionally, Rust has driver support for both Cassandra and ScyllaDB databases.

Data Flow

6406629e7ba3569d3c32c8ed_Example-1@2x

To further optimize performance, the Discord team built a consistent hashing-based routing on data services. Each request was associated with a routing key, such as a channel ID for messages. This way, all requests for the same channel were directed to the same instance of the data service, reducing the load on the database.

640662a51e3e13599d292404_Example-2@2x

While these improvements contributed to reducing hot partition issues and latency in the Cassandra cluster, the problems were not eradicated; they just became less frequent. The ultimate goal was to move to the ideal ScyllaDB cluster and complete the data migration.

A Significant Challenge

Migration requirements were complex: trillions of messages needed to be migrated seamlessly and swiftly. The entire process had to be carried out in highly coordinated stages to avoid disruptions.

The first stage involved provisioning a new ScyllaDB cluster using a super-disk storage topology. This configuration leveraged the speed of local SSDs and the durability of persistent disk by using RAID to mirror the data. With the new cluster operational, data migration could commence.

The initial migration plan was designed to bring benefits quickly. The idea was to start using the new ScyllaDB cluster for the most recent data, using a cutoff point, and then migrate historical data later. While this introduced more complexity, it was believed to be worthwhile.

Duplication of new data was initiated simultaneously in both Cassandra and ScyllaDB, while the ScyllaDB Spark migrator was being provisioned. This stage required various adjustments, and once configured, an estimated three-month timeframe was projected.

However, the team sought ways to expedite the process. They remembered developing a fast and high-performance database library, which could be extended. Thus, they chose to rewrite the data migrator itself in Rust.

In an afternoon, the data services library was extended to support large-scale data migrations. This extension read token ranges from a database, checked them locally using SQLite, and then sent them to ScyllaDB. With the enhanced migrator in operation, a new estimate was obtained: just nine days! With the possibility of migrating data quickly, the complicated batch-based approach was abandoned, and a complete migration was undertaken at once.

To validate the data, an automated test was conducted by sending a small percentage of reads to both databases and comparing the results. Everything seemed to be working correctly. While the cluster handled production traffic well, Cassandra continued to suffer from frequent latency issues.

One year later…

What previously required 177 nodes on Cassandra can now run on 72 nodes on ScyllaDB. This is possible because each ScyllaDB node has 9 TB of disk space, surpassing Cassandra's average of 4 TB per node.

Latencies have also improved significantly. For example, searching for historical messages had a p99 between 40-125ms on Cassandra, while on ScyllaDB it has a smooth latency of 15ms p99. Message insertion performance, which varied between 5-70ms p99 on Cassandra, is now consistent at 5ms p99 on ScyllaDB. Thanks to these performance improvements, they have discovered new use cases for their products now that there is trust in their messaging database.

The acid test for the new architecture took place at the end of 2022, when users around the world were tuning in to watch the World Cup. The sporting event had an impact on the platform, with spikes in messages for each goal scored. The definitive proof of ScyllaDB's resilience and speed on Discord came in the historic final of the competition. While fans on a global scale were stressed watching a match that was marked by extraordinary moments, Discord and the messaging database were serene. Messaging at levels never seen before on the platform was advanced and being handled perfectly.

The migration was an unequivocal success and survived the ordeal: with its data services now based on Rust and ScyllaDB, Discord was able to handle this traffic and provide a platform for its users to communicate.

eduaardofranco commented 8 months ago

that is a lot of data