EccentricLoggers / peloton

Street Strength Database Management System for Real-Time Analytics
http://pelotondb.org
Apache License 2.0
1 stars 2 forks source link

Issue to discuss further: group commit #12

Closed eric-haibin-lin closed 7 years ago

eric-haibin-lin commented 8 years ago

With Joy:

With Andy:

eric-haibin-lin commented 8 years ago
  1. Do we need epoch manager in peloton? In Silo, transactions get their txn id based on epoch and are group committed per epoch. The transactions are committed when both txn logs and pepoch(persistent epoch) file fsync(), which involves an additional fsync() call to persist pepoch. In Peloton, commit id is generated via the single txn manager(a central point of contention in system). During logging, there's no need to generate a "pepoch" file. A group of txns return committed to client as soon as the logs fsync() successfully. Fsync() is invoked periodically (e.g. per 40ms).
  2. The benefit of replay logs in reverse order In SiloR, logs are replayed in reserve order to avoid unnecessary update of the same tuple. Since SiloR uses OCC, as long as the most recent version of the tuple is recovered, SiloR can safely ignore all past logs on the same tuple. However, Peloton is based on MVCC and logs contain all physical versions of the same logical tuple. If peloton replays the log in reserve order, the system still has to keep track of all header information(version) of the tuples, to infer if multiple logs refer to the same logical tuple. We can avoid writing tuple values for tuples of past versions given tile group header is corrected updated. Replaying logs in reverse order doesn't sound very helpful. Maybe we can implement it in the future and see how much performance improvement we can get.
  3. Multiple txns handled by a single worker thread Is it common for DBMS to have a single worker thread to handle more than one transactions while it's waiting for logs to be flushed? Suppose group commit is used, and we have a worker thread which commits txn 0 and the logs are not fsync() yet. Is it possible to let the worker thread remember its state for txn 0 and pick up txn 1 and work on txn 1 for a while. When the fsync finishes, the worker thread is notified, return commit message to client 0, and continues to work on txn 1. Should the logging mechanism in peloton be prepared of this situation?

Joy is out of town, @MattPerron @abhishekjoshi2 please review and I can send an email to Joy regarding his opinion. Thanks!

eric-haibin-lin commented 8 years ago

TODO: see how postgres does group commit.