chrsmithdemos / leveldb

Automatically exported from code.google.com/p/leveldb
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Add DB::SuspendCompactions() and DB:: ResumeCompactions() methods #184

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
If an application wants to take a consistent backup of the leveldb data files, 
it needs to ensure that the background compaction threads are not modifying 
those files.

Original issue reported on code.google.com by chir...@gmail.com on 1 Jul 2013 at 1:33

GoogleCodeExporter commented 9 years ago
Attaching patch that implements the requested feature.

Original comment by chir...@gmail.com on 1 Jul 2013 at 1:38

Attachments:

GoogleCodeExporter commented 9 years ago
Can you try a different way for getting a consistent view:
(1) Make a special Env; say BackupEnv
(2) Add a method "BackupEnv::Backup".
(3) This method blocks other operations (e.g., file creations/deletions) while 
it runs
(4) This method copies or hard-links all of the files in the directory to the 
dest.

Refinements are possible:
(A) Instead of blocking file creations, just grab the list of files at start of 
BackupEnv::Backup and allow concurrent file creations to continue unhindered.

(B) Instead of blocking file deletions, just record the name of the file that 
should be deleted in an internal vector and have BackupEnv::Backup do these 
deferred file deletions before exiting.

(C) If you want a consistent view of the log files (which your suggested 
ResumeCompaction, SuspendCompaction approach cannot give you), remember the 
initial length of all log files at the beginning of Backup(), and only copy up 
to those lengths in the body of backup.

The Backup function might look something like this:
  grab lock
  backing_up_ = true;
  get list of all files
  remember lengths of all log files
  release lock

  for each file in list {
    if it is a log-file {
      copy appropriate prefix (based on length record above) to dest
    } else {
      hard-link file to dest
    }
  }

  vector<string> to_delete;
  grab lock
  to_delete.swap(deferred_deletions_);
  backing_up_ = false;
  release lock
  delete every file in to_delete;

File creation:
  create as normal

File deletion
  grab lock
  if backing_up_ {
    deferred_deletions_.push_back(filename);
  } else {
    delete the file
  }
  release lock

Original comment by san...@google.com on 1 Jul 2013 at 4:21

GoogleCodeExporter commented 9 years ago
The Application can control it's update threads.  So it can stop causing log 
updates and thus get a consistent view of the log.  It only cannot control the 
compaction threads.

Original comment by chir...@gmail.com on 1 Jul 2013 at 4:25

GoogleCodeExporter commented 9 years ago
> The Application can control it's update threads.  So it can stop causing log 
updates and thus get a consistent view of the log.  It only cannot control the 
compaction threads.

Good point, but regardless, can you consider my suggested approach (BackupEnv). 
 It requires no changes to the leveldb public interface.  Furthermore it 
isolates the logic for consistency in a single class (BackupEnv) instead of 
requiring changes to be spread through leveldb compaction code, application 
code (to prevent log writes), etc.

Original comment by san...@google.com on 1 Jul 2013 at 4:40

GoogleCodeExporter commented 9 years ago
Are you guys volunteering to implement the BackupEnv you have described?

Original comment by chir...@gmail.com on 1 Jul 2013 at 4:58

GoogleCodeExporter commented 9 years ago
Sorry, but I am not volunteering.  What I was saying indirectly was that there 
is a straightforward way to achieve what you need without requiring changes and 
extra dependencies on leveldb behavior.

Aside, your patch assumes too much about Env::Schedule, which has the following 
comment:
  // Arrange to run "(*function)(arg)" once in a background thread.
  //
  // "function" may run in an unspecified thread.  Multiple functions
  // added to the same Env may run concurrently in different threads.
  // I.e., the caller may not assume that background work items are
  // serialized.
It also assumes that no internal thread writes to log files (which is true 
today, but may not be true in the future; e.g., if we put in indexing or 
garbage collection of some type).

Original comment by san...@google.com on 1 Jul 2013 at 10:54

GoogleCodeExporter commented 9 years ago
A simple way to do online backups of the leveldb data files is needed.  Your 
suggested method seems very complex is equivalent to saying that the leveldb 
project feels it does not need to support such a use case.

Suspend/Resume is easy to understand and should be easy to adapt and implement 
for any future change in the threading scheme.

Original comment by chir...@gmail.com on 2 Jul 2013 at 12:09

GoogleCodeExporter commented 9 years ago
I'd also be happy with something like like: DB:CopyDB(const char 
*targetDirectory) as long as it used hard links as much as possible to make the 
backup fast and cheap. 

Original comment by chir...@gmail.com on 2 Jul 2013 at 1:27

GoogleCodeExporter commented 9 years ago
[Copy of email message that seems to be missing from the tracker]

A simple way to do online backups of the leveldb data files is needed.  Your 
suggested method seems very complex is equivalent to saying that the leveldb 
project feels it does not need to support such a use case.

I don't think it is very complex.  And it has several benefits:

(1) It is relatively separable from the complicated guts of leveldb.
(2) It does not make assumptions like:
     (a) Only user writes cause log files to be modified (think about future changes like automatic garbage collection which could trigger logging)
     (b) Only a single thread is used by compactions and scheduling something in the background is enough to ensure they are not running. This directly violates the comment written down in env.h
(3) My suggested approach allows for long running backups without triggering 
unbounded memory growth in the meantime (since it does not prevent compactions 
during the backup process).

  // Arrange to run "(*function)(arg)" once in a background thread.
  //
  // "function" may run in an unspecified thread.  Multiple functions
  // added to the same Env may run concurrently in different threads.
  // I.e., the caller may not assume that background work items are
  // serialized.
  virtual void Schedule(...);

About the complexity of what I am suggesting, it will be simpler than your 
suggested change, especially when we factor in the size of the module in which 
changes go and the likelihood of future changes to db_impl.cc breaking subtle 
ordering assumption

> I'd also be happy with something like like: DB:CopyDB(const char 
*targetDirectory)
> as long as it used hard links as much as possible to make the backup fast and 
cheap.

This is better from my perspective than Suspend/Resume compaction calls.  
However, I would like to leave this out of leveldb proper since things like 
hard links are are not portable.  E.g., we might need a very different 
implementation for different platforms. Furthermore, things like incremental 
updates (e.g., copying to a remote file system but sharing already copied 
files) will lead to lots of complexity.

For now, I think the best bet is for you to use a custom env.

Original comment by san...@google.com on 8 Jul 2013 at 11:16

GoogleCodeExporter commented 9 years ago
Here is an (untested) implementation of the Env based approach I was 
suggesting.  It adds a Backup method that generates a list of files to be saved.

class BackupEnv : public leveldb::EnvWrapper {
 private:
  leveldb::port::Mutex mu_;
  bool backing_up_;
  std::vector<std::string> deferred_deletions_;

 public:
  explicit BackupEnv(leveldb::Env* t)
      : leveldb::EnvWrapper(t), backing_up_(false) {
  }

  Status DeleteFile(const std::string& f) {
    mu_.Lock();
    Status s;
    if (backing_up_) {
      deferred_deletions_.push_back(f);
    } else {
      s = target_->DeleteFile(f);
    }
    mu_.Unlock();
    return s;
  }

  // Call (*save)(arg, filename, length) for every file that
  // should be backed up to construct a consistent view of the
  // database.  "length" may be negative to indicate that the entire
  // file must be copied.  Otherwise the first "length" bytes must be
  // copied.
  Status Backup(const std::string& dir,
                Status (*func)(void*, const std::string& fname, int64 length),
                void* arg) {
    // Get a consistent view of all files.
    mu_.Lock();
    std::vector<std::string> files;
    Status s = GetChildren(dir, &files);
    if (!s.ok()) {
      mu_.Unlock();
      return s;
    }
    std::vector<int64> lengths(files.size());
    for (size_t i = 0; i < files.size(); i++) {
      if (IsLogFile(files[i])) {
        uint64_t len;
        s = GetFileSize(files[i], &len);
        if (!s.ok()) {
          mu_.Unlock();
          return s;
        }
        lengths[i] = len;
      } else {
        lengths[i] = -1;
      }
    }
    backing_up_ = true;
    mu_.Unlock();

    for (size_t i = 0; s.ok() && i < files.size(); i++) {
      s = (*func)(arg, files[i], lengths[i]);
    }

    mu_.Lock();
    backing_up_ = false;
    for (size_t i = 0; i < deferred_deletions_.size(); i+) {
      target()->DeleteFile(deferred_deletions_[i]);
    }
    deferred_deletions_.clear();
    mu_.Unlock();

    return s;
  }

  bool IsLogFile(const std::string& fname) {
    ... match MANIFEST and log file names ...
  }
};

Original comment by san...@google.com on 8 Jul 2013 at 11:20

GoogleCodeExporter commented 9 years ago
If it's that simple please provide a BackupEnv in the leveldb project that's 
supported and tested.

If it's not that simple, then please reconsider your recommendation.

Original comment by chir...@gmail.com on 9 Jul 2013 at 3:12

GoogleCodeExporter commented 9 years ago
Sorry, but we do not have the time or resources to maintain and support the 
BackupEnv code.  Please implement BackupEnv in your library.

Original comment by san...@google.com on 10 Jul 2013 at 9:58

GoogleCodeExporter commented 9 years ago
I have already implemented  DB::SuspendCompactions() and DB:: 
ResumeCompactions()  in my fork thx.

Original comment by chir...@gmail.com on 10 Jul 2013 at 10:04

GoogleCodeExporter commented 9 years ago
I've implemented a slightly modified version of the suggested BackupEnv 
approach here: 
https://github.com/titanous/HyperLevelDB/commit/344276355f4753317522371fe55823e6
b29805e7

It's based on HyperLevelDB, and this is my first time writing C++, but it seems 
to work well.

Original comment by jonathan@titanous.com on 19 Jul 2013 at 8:14