log flush daemon loops forever at shutdown after log is truncated

For the following options: sm_shutdown_clean = true; sm_truncate_log = true;

When the sm is shuttding down:

bool truncate_archive = _options.get_bool_option("sm_truncate_archive", false);
    if (shutdown_clean || truncate) {
        ERROUT(<< "SM performing clean shutdown");

        W_COERCE(log->flush_all());
        bf->get_cleaner()->wakeup(true);
        // CS TODO: two wakeups are necessary when using the async collector
        bf->get_cleaner()->wakeup(true);
        me()->check_actual_pin_count(0);

        // Force alloc and stnode pages
        lsn_t dur_lsn = smlevel_0::log->durable_lsn();
        W_COERCE(vol->get_alloc_cache()->write_dirty_pages(dur_lsn));
        W_COERCE(vol->get_stnode_cache()->write_page(dur_lsn));

        if (truncate) { W_COERCE(_truncate_log(truncate_archive)); }
        else { chkpt->take(); }

        ERROUT(<< "All pages cleaned successfully");
    }
    else {
        ERROUT(<< "SM performing dirty shutdown");
    }

The method ssm::_truncate_log(...) is called:

rc_t ss_m::_truncate_log(bool truncate_archive)
{
    DBGTHRD(<< "Truncating log on LSN " << log->durable_lsn());

    // Wait for cleaner to finish its current round
    bf->shutdown();
    W_DO(log->flush_all());

    if (truncate_archive && logArchiver) {
        logArchiver->archiveUntilLSN(log->durable_lsn());
        logArchiver->getDirectory()->deleteAllRuns();
    }

    W_DO(log->truncate());
    W_DO(log->flush_all());

    // this should be an "empty" checkpoint
    chkpt->take();

    // generate an "empty" log archive run
    if(logArchiver) {
        logArchiver->archiveUntilLSN(log->durable_lsn());
    }

    log->get_storage()->delete_old_partitions();

    return RCOK;
}

The log flush daemon is activated by the log->flush_all() call, after log->truncate(). The log flush daemon loops forever here:

while(1) {

        // wait for a kick. Kicks come at regular intervals from
        // inserts, but also at arbitrary times when threads request a
        // flush.
        {
            CRITICAL_SECTION(cs, _wait_flush_lock);
            // CS: commented out check for waiting_for_space -- don't know why it was here?
            //if(success && (*&_waiting_for_space || *&_waiting_for_flush)) {
            if(success && *&_waiting_for_flush) {
                //_waiting_for_flush = _waiting_for_space = false;
                _waiting_for_flush = false;
                DO_PTHREAD(pthread_cond_broadcast(&_wait_cond));
                // wake up anyone waiting on log flush
            }
            if(_shutting_down) {
                _shutting_down = false;
                break;
            }

            // NOTE: right now the thread waiting for a flush has woken up or will woke up, but...
            // this thread, as long as success is true (it just flushed something in the previous
            // flush_daemon_work), will keep calling flush_daemon_work until there is nothing to flush....
            // this happens in the background

            // sleep. We don't care if we get a spurious wakeup
            //if(!success && !*&_waiting_for_space && !*&_waiting_for_flush) {
            if(!success && !*&_waiting_for_flush) {
                // Use signal since the only thread that should be waiting
                // on the _flush_cond is the log flush daemon.
                DO_PTHREAD(pthread_cond_wait(&_flush_cond, &_wait_flush_lock));
            }
        }

        // flush all records later than last_completed_flush_lsn
        // and return the resulting last durable lsn
        lsn_t lsn = flush_daemon_work(last_completed_flush_lsn);

        // success=true if we wrote anything
        success = (lsn != last_completed_flush_lsn);
        last_completed_flush_lsn = lsn;
    }

The flush_daemon_work() is called multiple times in the loop and returns the parameter (last_completed_flush_lsn/old_mark) here:

if(_old_epoch.start == _old_epoch.end) {
            // no wrap -- flush only the new
            w_assert1(_cur_epoch.end >= _cur_epoch.start);
            start2 = _cur_epoch.start;
            end2 = _cur_epoch.end;
            w_assert1(end2 >= start2);
            // false alarm?
            if(start2 == end2) {
                return old_mark;
            }
            _cur_epoch.start = end2;

            start1 = start2; // fake start1 so the start_lsn calc below works
            end1 = start2;

            base_lsn_before = base_lsn_after;
 }

Shutting down the log_core to interrupt the loop does not seem to be an option, as there are other methods later that depend on the log (like taking a final checkpoint). Currently, the log is the LAST thing to be turned off.

Hypothesis: should we return old_mark when start2 == end2? I have to further investigate if the epochs are correct.

caetanosauer / zero

log flush daemon loops forever at shutdown after log is truncated #26