dnanexus-rnd / GLnexus

Scalable gVCF merging and joint variant calling for population sequencing projects
Apache License 2.0
145 stars 37 forks source link

Question on use of SyncWAL during bulk load. #147

Closed xunjieli closed 5 years ago

xunjieli commented 5 years ago

In RocksKeyValue::DB::flush(), should SyncWAL be conditioned on != OpenMode::BULK_LOAD ? The write options has WAL disabled when BULK_LOAD.

https://github.com/dnanexus-rnd/GLnexus/blob/master/src/RocksKeyValue.cc#L557

    Status flush() override {
        if (mode_ != OpenMode::READ_ONLY) {
            Status s;
            S(convertStatus(db_->SyncWAL()));  // Should this be conditioned on != BULK_LOAD?
            for (const auto& p : coll2handle_) {
                S(convertStatus(db_->Flush(rocksdb::FlushOptions(), p.second)));
            }
        }
        return Status::OK();
    }

WAL is disabled here for BULK_LOAD.

            // prepare write options
            if (mode_ == OpenMode::BULK_LOAD) {
                write_options_.disableWAL = true;
                batch_write_options_.disableWAL = true;
            } else {
                batch_write_options_.sync = true;
            }
mlin commented 5 years ago

Hi! My non-researched speculation would be that SyncWAL() should reduce to a no-op when disableWAL is set. Is the invocation failing or causing some other issue for you?

I believe the current glnexus_cli driver code only uses the BULK_LOAD and READ_ONLY modes, so the WAL never really comes into play. But in theory it can be used to support concurrent read/write from BCFKeyValueData. LMK if you'd like to discuss this

xunjieli commented 5 years ago

In mode BULK_LOAD, SyncWAL() is not an no-op, see https://github.com/facebook/rocksdb/blob/cf852fdf55feb4b06c65e646ee1bd10021c6df40/db/db_impl.cc#L931 (also pasted below), which doesn't check whether WAL is actually disabled.

I found this because my custom rocksdb env doesn't override WritableFileWrapper ::IsSyncThreadSafe() to return true, and GLnexus's RocksKeyValue::DB::flush() is returning an error status. I can fix my IsSyncThreadSafe(), but performing SyncWAL() in BULK_LOAD mode wastes compute.

Adding a condition on calling SyncWAL() should be easy enough?

Status DBImpl::SyncWAL() {
  autovector<log::Writer*, 1> logs_to_sync;
  bool need_log_dir_sync;
  uint64_t current_log_number;

  {
    InstrumentedMutexLock l(&mutex_);
    assert(!logs_.empty());

    // This SyncWAL() call only cares about logs up to this number.
    current_log_number = logfile_number_;

    while (logs_.front().number <= current_log_number &&
           logs_.front().getting_synced) {
      log_sync_cv_.Wait();
    }
    // First check that logs are safe to sync in background.
    for (auto it = logs_.begin();
         it != logs_.end() && it->number <= current_log_number; ++it) {
      if (!it->writer->file()->writable_file()->IsSyncThreadSafe()) {
        return Status::NotSupported(
            "SyncWAL() is not supported for this implementation of WAL file",
            immutable_db_options_.allow_mmap_writes
                ? "try setting Options::allow_mmap_writes to false"
                : Slice());
      }
    }
    for (auto it = logs_.begin();
         it != logs_.end() && it->number <= current_log_number; ++it) {
      auto& log = *it;
      assert(!log.getting_synced);
      log.getting_synced = true;
      logs_to_sync.push_back(log.writer);
    }

    need_log_dir_sync = !log_dir_synced_;
  }

  TEST_SYNC_POINT("DBWALTest::SyncWALNotWaitWrite:1");
  RecordTick(stats_, WAL_FILE_SYNCED);
  Status status;
  for (log::Writer* log : logs_to_sync) {
    status = log->file()->SyncWithoutFlush(immutable_db_options_.use_fsync);
    if (!status.ok()) {
      break;
    }
  }
  if (status.ok() && need_log_dir_sync) {
    status = directories_.GetWalDir()->Fsync();
  }
  TEST_SYNC_POINT("DBWALTest::SyncWALNotWaitWrite:2");

  TEST_SYNC_POINT("DBImpl::SyncWAL:BeforeMarkLogsSynced:1");
  {
    InstrumentedMutexLock l(&mutex_);
    MarkLogsSynced(current_log_number, need_log_dir_sync, status);
  }
  TEST_SYNC_POINT("DBImpl::SyncWAL:BeforeMarkLogsSynced:2");

  return status;
}
mlin commented 5 years ago

Sure, I'll go ahead and patch the SyncWAL invocation as you suggest.

LMK if you'd like to chat sometime about swapping out the storage backend. These might be interesting projects if you haven't seen them already:

  1. https://github.com/mlin/RocksWorm
  2. https://github.com/rockset/rocksdb-cloud
  3. For GLnexus, consideration might also be given to implementing the KeyValueData interface without RocksDB at all.
xunjieli commented 5 years ago

Yes, I am happy to chat more offline. I've directly implemented GLnexus's KeyValueData using two different cloud-based storage solutions, but the performance is poor. I ended up swapping out RocksDB's env.h (https://github.com/facebook/rocksdb/blob/master/include/rocksdb/env.h) with a custom env. It works reasonably well.