cmu-db / noisepage

Self-Driving Database Management System from Carnegie Mellon University
https://noise.page
MIT License
1.73k stars 501 forks source link

Replication Message Serialization Speedup #1570

Open jkosh44 opened 3 years ago

jkosh44 commented 3 years ago

Summary

Currently, we serialize all messages related to replication using JSON. The implementation can be found here: https://github.com/cmu-db/noisepage/blob/97eb7ecc83785ed57cc02a14e3d63b553b252e2e/src/include/replication/replication_messages.h#L36-L108 https://github.com/cmu-db/noisepage/blob/97eb7ecc83785ed57cc02a14e3d63b553b252e2e/src/replication/replication_messages.cpp#L19-L56

Turning replication on causes a significant slowdown to the database and one of the primary causes is the JSON serialization of messages. Below are some performance results of running TPCC on dev10 with 8 threads with various database configurations:

Replication Durability Modifications Request Throughput (request per second)
DISABLED Sync None 208.891796443148
Async Sync None 96.2971597537919
Async Sync Remove JSON serialization of messages (all messages replaced with empty string) 181.1826548771372

Below are some metrics on log throughput for the primary node with various database configurations:

Replication Durability Modifications Log Throughput (records per millisecond)
DISABLED Sync None 98.80787541
DISABLED Async None 102.9403467
Sync Sync None 1.130251208
Async Sync None 1.273598347
Async Async None 1.235403819
Async Async Remove JSON serialization of messages (all messages replaced with empty string) 88.39578983

Just for reference below are some metrics on log throughput for the replica node

Durability Log Throughput (records per millisecond)
Sync 1.009166671
Async 1.007238045

Solution

A solution to this is to switch to a different message format than JSON and I plan on investigating a handful of alternatives and their impact on log throughput and request throughput.

Nlohmann

We use the Nlohmann JSON package to implement JSON in NoisePage. This package comes with a bunch of other binary formats built into the package. It's probably worth trying all of these since they can each be implemented with a couple of changed lines. Some require you to first convert your data to JSON before converting to a different binary format, and it's unclear to me if this has a significant performance penalty compared to converting directly to the message format.

Alternatives

Below are a handful of message formats I have found from some brief research. I plan on narrowing this down to roughly 4 after some more research.

Dependency Bloat

One of the considerations when implementing a new message format will be dependency bloat. I don't plan on coming up with my own implementation for any of these formats so we'll have to bring in third-party libraries. It will be important to make sure we don't bring in more than necessary to avoid dependency bloat.

jkosh44 commented 3 years ago

If anyone has a particular format that I left out that they think would be good, please let me know.

jkosh44 commented 3 years ago

BSON

branch: https://github.com/jkosh44/noisepage/tree/bson

UPDATE: The original JSON implementation was converting the record contents itself to and from CBOR. The original numbers used for BSON kept that conversion in. The updated numbers remove that conversion.

Log Throughput Primary

Replication Durability Modifications Log Throughput (records per millisecond)
Sync Sync None 0.839491378
Async Sync None 0.9220486988
Async Async None 0.9210122535

Log Throughput Replica

Durability Log Throughput (records per millisecond)
Sync 0.7778571493
Async 0.7783025631

Log Throughput Primary

Replication Durability Modifications Log Throughput (records per millisecond)
Sync Sync None 3.8700566481216665
Async Sync None 30.736948475102366
Async Async None 29.028352465595663

Log Throughput Replica

Durability Log Throughput (records per millisecond)
Sync 3.8883037697513125
Async 3.874525911
jkosh44 commented 3 years ago

Message Pack

branch: https://github.com/jkosh44/noisepage/tree/messagepack

Log Throughput Primary

Replication Durability Modifications Log Throughput (records per millisecond)
Sync Sync None 3.8606188627282036
Async Sync None 29.38279984543937
Async Async None 28.471362156671457

Log Throughput Replica

Durability Log Throughput (records per millisecond)
Sync 3.887471136
Async 3.91150189
jkosh44 commented 3 years ago

UBJSON

branch: https://github.com/jkosh44/noisepage/tree/ubjson

Log Throughput Primary

Replication Durability Modifications Log Throughput (records per millisecond)
Sync Sync None 3.864162520678762
Async Sync None 30.148788480999627
Async Async None 28.387447086724443

Log Throughput Replica

Durability Log Throughput (records per millisecond)
Sync 3.887471136
Async 3.91150189
jkosh44 commented 3 years ago

CBOR

branch: https://github.com/jkosh44/noisepage/tree/cbor

Log Throughput Primary

Replication Durability Modifications Log Throughput (records per millisecond)
Sync Sync None 3.861907674897514
Async Sync None 30.714806087560053
Async Async None 28.913739773546347

Log Throughput Replica

Durability Log Throughput (records per millisecond)
Sync 3.918585055891593
Async 3.9006568057238713