For the following options:
sm_shutdown_clean = true; sm_truncate_log = true;
When the sm is shuttding down:
bool truncate_archive = _options.get_bool_option("sm_truncate_archive", false);
if (shutdown_clean || truncate) {
ERROUT(<< "SM performing clean shutdown");
W_COERCE(log->flush_all());
bf->get_cleaner()->wakeup(true);
// CS TODO: two wakeups are necessary when using the async collector
bf->get_cleaner()->wakeup(true);
me()->check_actual_pin_count(0);
// Force alloc and stnode pages
lsn_t dur_lsn = smlevel_0::log->durable_lsn();
W_COERCE(vol->get_alloc_cache()->write_dirty_pages(dur_lsn));
W_COERCE(vol->get_stnode_cache()->write_page(dur_lsn));
if (truncate) { W_COERCE(_truncate_log(truncate_archive)); }
else { chkpt->take(); }
ERROUT(<< "All pages cleaned successfully");
}
else {
ERROUT(<< "SM performing dirty shutdown");
}
The method ssm::_truncate_log(...) is called:
rc_t ss_m::_truncate_log(bool truncate_archive)
{
DBGTHRD(<< "Truncating log on LSN " << log->durable_lsn());
// Wait for cleaner to finish its current round
bf->shutdown();
W_DO(log->flush_all());
if (truncate_archive && logArchiver) {
logArchiver->archiveUntilLSN(log->durable_lsn());
logArchiver->getDirectory()->deleteAllRuns();
}
W_DO(log->truncate());
W_DO(log->flush_all());
// this should be an "empty" checkpoint
chkpt->take();
// generate an "empty" log archive run
if(logArchiver) {
logArchiver->archiveUntilLSN(log->durable_lsn());
}
log->get_storage()->delete_old_partitions();
return RCOK;
}
The log flush daemon is activated by the log->flush_all() call, after log->truncate().
The log flush daemon loops forever here:
while(1) {
// wait for a kick. Kicks come at regular intervals from
// inserts, but also at arbitrary times when threads request a
// flush.
{
CRITICAL_SECTION(cs, _wait_flush_lock);
// CS: commented out check for waiting_for_space -- don't know why it was here?
//if(success && (*&_waiting_for_space || *&_waiting_for_flush)) {
if(success && *&_waiting_for_flush) {
//_waiting_for_flush = _waiting_for_space = false;
_waiting_for_flush = false;
DO_PTHREAD(pthread_cond_broadcast(&_wait_cond));
// wake up anyone waiting on log flush
}
if(_shutting_down) {
_shutting_down = false;
break;
}
// NOTE: right now the thread waiting for a flush has woken up or will woke up, but...
// this thread, as long as success is true (it just flushed something in the previous
// flush_daemon_work), will keep calling flush_daemon_work until there is nothing to flush....
// this happens in the background
// sleep. We don't care if we get a spurious wakeup
//if(!success && !*&_waiting_for_space && !*&_waiting_for_flush) {
if(!success && !*&_waiting_for_flush) {
// Use signal since the only thread that should be waiting
// on the _flush_cond is the log flush daemon.
DO_PTHREAD(pthread_cond_wait(&_flush_cond, &_wait_flush_lock));
}
}
// flush all records later than last_completed_flush_lsn
// and return the resulting last durable lsn
lsn_t lsn = flush_daemon_work(last_completed_flush_lsn);
// success=true if we wrote anything
success = (lsn != last_completed_flush_lsn);
last_completed_flush_lsn = lsn;
}
The flush_daemon_work() is called multiple times in the loop and returns the parameter (last_completed_flush_lsn/old_mark) here:
if(_old_epoch.start == _old_epoch.end) {
// no wrap -- flush only the new
w_assert1(_cur_epoch.end >= _cur_epoch.start);
start2 = _cur_epoch.start;
end2 = _cur_epoch.end;
w_assert1(end2 >= start2);
// false alarm?
if(start2 == end2) {
return old_mark;
}
_cur_epoch.start = end2;
start1 = start2; // fake start1 so the start_lsn calc below works
end1 = start2;
base_lsn_before = base_lsn_after;
}
Shutting down the log_core to interrupt the loop does not seem to be an option, as there are other methods later that depend on the log (like taking a final checkpoint). Currently, the log is the LAST thing to be turned off.
Hypothesis: should we return old_mark when start2 == end2? I have to further investigate if the epochs are correct.
For the following options:
sm_shutdown_clean = true; sm_truncate_log = true;
When the sm is shuttding down:
The method ssm::_truncate_log(...) is called:
The log flush daemon is activated by the log->flush_all() call, after log->truncate(). The log flush daemon loops forever here:
The flush_daemon_work() is called multiple times in the loop and returns the parameter (last_completed_flush_lsn/old_mark) here:
Shutting down the log_core to interrupt the loop does not seem to be an option, as there are other methods later that depend on the log (like taking a final checkpoint). Currently, the log is the LAST thing to be turned off.
Hypothesis: should we return old_mark when start2 == end2? I have to further investigate if the epochs are correct.