eBay / HomeStore

Storage Engine for block and key/value stores.
Apache License 2.0
23 stars 21 forks source link

commit blk during log replay in rep dev #524

Closed raakella1 closed 3 months ago

codecov-commenter commented 3 months ago

:warning: Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 88.88889% with 1 line in your changes missing coverage. Please review.

Project coverage is 67.45%. Comparing base (1a0cef8) to head (7652028). Report is 51 commits behind head on master.

Files with missing lines Patch % Lines
src/lib/replication/repl_dev/raft_repl_dev.cpp 87.50% 1 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #524 +/- ## =========================================== + Coverage 56.51% 67.45% +10.94% =========================================== Files 108 109 +1 Lines 10300 10427 +127 Branches 1402 1399 -3 =========================================== + Hits 5821 7034 +1213 + Misses 3894 2717 -1177 - Partials 585 676 +91 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

xiaoxichen commented 3 months ago

As discussed in DM, this PR is not right as there is no guarantee the data has been written to the allocated blks before the restart.

JacksonYao287 commented 3 months ago
        HomeRaftLogStore::end_of_append_batch(start_lsn, count);
        HISTOGRAM_OBSERVE(m_rd.metrics(), raft_end_of_append_batch_latency_us, get_elapsed_time_us(cur_time));

        cur_time = std::chrono::steady_clock::now();
        // Wait for the fetch and write to be completed successfully.
        std::move(fut).wait();

we can put std::move(fut).wait() before end_of_append_batch.

if data is written but log is not flushed, that`s fine, since the block is not committed

xiaoxichen commented 3 months ago

yes we can, it is trading regular IO latency vs space leak during recovery, which I am not sure it worth to .

Prior to that, we have to solve the rollback stuff.

JacksonYao287 commented 3 months ago

Prior to that, we have to solve the rollback stuff.

rollback happens when a request is pre_committed but not committed. 1 in normal case, we can free the blk in on_rollback. 2 in recovery case , since the blk is not commited (commit_blk will be called in on_commit), so blk leak will not happen. in on_log_found, if the lsn of a log entry is bigger than dc_sln, we can just discard it.

any input?

xiaoxichen commented 3 months ago

discussion: https://ebay-eng.slack.com/archives/CAVQXEAKS/p1724866496708579