cmu-db / bustub

The BusTub Relational Database Management System (Educational)
https://15445.courses.cs.cmu.edu
MIT License
4.12k stars 1.8k forks source link

feat: Add support for asynchronous flushing in DiskManager::WriteLog() && improve overall structure of DiskManager #580

Open xzhseh opened 1 year ago

xzhseh commented 1 year ago

This PR does four things in general:

  1. Fix the issue #449 by adding a log_io_latch_ to ensure logging operations' security under concurrency access.(Though the log-related functions are not used except in test at present🤪)
  2. Move the definitions of several functions from disk_manager.cpp to disk_manager.h to improve readability.
  3. Add has_shut_down_ flag and change the destructor implementation (and this also includes support for asynchronous flushing), to ensure everything goes well if the ShutDown() is not manually called by the programmer :)
  4. Add support for asynchronous flushing in DiskManager::WriteLog(), also make the semantic of flush_log_ be consistent across all stage.

Current problem (To be reviewed): Just as I mentioned in the TODO section, the program will crash if the log flushing thread is not done after 10s, which can be a rare case, but this can also be handle more softly, like log a warning...etc

xzhseh commented 1 year ago

Also, I think the asynchronous version of the four functions WritePage, ReadPage, WriteLog, ReadLog can be of help if bustub future will consider in providing WAL related functionalities..🤤 (Actually I found WAL supports in the early versions of 15-445, like in 19 or 18 version, there actually was a project for students to complete, but now this part seems broken with new features coming in..? I think I can be of help to refactor the code related to recovery, and this also will not be conflicted with the possible upcoming MVCC version.) If it's okay, I can write the asynchronous version of these functions🥰

infdahai commented 1 year ago

I think it's good to throw a warning before an exit.

xzhseh commented 1 year ago

I revisited the original code design and found it confusing for logging operations, so I redesigned some of the functions & variables related to Log.

  1. In the original design, it looks like the designer wants to separate the write and flush operation when dealing with fstream in logging, and that's where the first few commits of this PR comes from, to support asynchronous flushing. That's to say, performs sequential write and maintains multiple asynchronous threads flushing in the background at the same time. This sounds good and reasonable before considering the actual use of fstream.
  2. fstream is not thread-safe at all, which means not only multiple accesses to fstream::write at the same time will lead to race condition, but accesses to fstream::write and fstream::flush at the same time by different threads will lead to race condition(or other synchronization problems) as well.
  3. Thus, if the only fstream object log_io_ is accessed both in WriteLog() (by the main thread) and flushing asynchronously (by another background thread), the result is undefined, and this will inevitably happen if sticks to original design.
  4. To solve the above problems, I separate the WriteLog() to two version —— The first one will block until all the asynchronous operations are done, and then start to perform sequential write. The second one is the asynchronous version of WriteLog, also with a future deque to ensure the completion of each background task. The ReadLog() is not changed, since the log contents to be read should always be the one after all flushing.

If the above changes look good, I'll write test cases in disk_manager_test.cpp to ensure the correctness. (Also add comments to the newly added functions)

xzhseh commented 1 year ago

This part may also be associated with LogManager. If anything related to LogManager or the currently disabled recovery test should also be changed, I'm glad to help with the refactoring :)

xzhseh commented 1 year ago

And according to Discussion #516, I agree with the use of fsync() to achieve full recovery functionality in the future, but for now, since the std::fstream::flush will synchronize the stream buffer with the underlying OS buffer, the subsequent ReadLog() or the recovery should be good (If not under extreme situation).

skyzh commented 1 year ago

Thanks for the PR but we are not writing any WALs currently in the system. The overall design looks good, but I'm actually in favor of dropping WALs for now until we have a concrete plan for the recovery project 🤣 We can keep this PR and see how it goes in the future.

xzhseh commented 1 year ago

Sure, let's see if the recovery project will be brought back in the future🤣 I'll focus on assisting & participating through the implementations (or refactoring) of other parts in bustub🥰