cmu-db / noisepage

Self-Driving Database Management System from Carnegie Mellon University
https://noise.page
MIT License
1.74k stars 502 forks source link

Investigate moving replication serialization to Messenger thread #1582

Open jkosh44 opened 3 years ago

jkosh44 commented 3 years ago

Feature Request

Summary

Currently all replication messages are serialized in the LogSerializationTask thread. This can potential slow down the LogSerializationTask. Additionally under the sync durability and async replication configuration, transactions will have to unnecessarily wait for replication messages to be serialized before committing.

The replication messages are serialized in the following places in the primary node: Log Record Batches: https://github.com/cmu-db/noisepage/blob/30bd6355d69868b019db6a5307e2e8de3704d506/src/replication/primary_replication_manager.cpp#L80-L86

Notify OAT: https://github.com/cmu-db/noisepage/blob/30bd6355d69868b019db6a5307e2e8de3704d506/src/replication/primary_replication_manager.cpp#L107-L111

Solution

It might be beneficial to move the message serialization to the Messenger thread itself. That way the LogSerializerTask won't be slowed down by message serialization.

Currently the Messenger thread maintains a map of pending messages https://github.com/cmu-db/noisepage/blob/b95bf1ef9b611c60ec0cc7bc21cde6c021bda864/src/include/messenger/messenger.h#L356 https://github.com/cmu-db/noisepage/blob/b95bf1ef9b611c60ec0cc7bc21cde6c021bda864/src/include/messenger/messenger.h#L305-L311

The pending message list contains many ZmqMessages which hold the messages as strings. https://github.com/cmu-db/noisepage/blob/b95bf1ef9b611c60ec0cc7bc21cde6c021bda864/src/include/messenger/messenger.h#L37-L98

Since we have multiple different types of messages I see two possible solutions

  1. Create a pending message list for each message type.
  2. Use some form of inheritance or templating so that the pending message list stores pointers to serializable objects.

Personally I think 2 might be cleaner.

lmwnshn commented 3 years ago

I agree that 2 might be cleaner.

lmwnshn commented 3 years ago

ah, this finally reminded me about our "async durability callbacks get swapped out".