IBM / couchbackup

Cloudant backup and restore library and command-line utility
Apache License 2.0
77 stars 21 forks source link

CouchBackup

npm (scoped) npm (scoped with tag)

 _____                  _    ______            _
/  __ \                | |   | ___ \          | |
| /  \/ ___  _   _  ___| |__ | |_/ / __ _  ___| | ___   _ _ __
| |    / _ \| | | |/ __| '_ \| ___ \/ _` |/ __| |/ / | | | '_ \
| \__/\ (_) | |_| | (__| | | | |_/ / (_| | (__|   <| |_| | |_) |
 \____/\___/ \__,_|\___|_| |_\____/ \__,_|\___|_|\_\\__,_| .__/
                                                         | |
                                                         |_|

CouchBackup is a command-line utility that backs up a Cloudant or CouchDB database to a text file. It comes with a companion command-line utility that can restore the backed up data.

Limitations

CouchBackup has some restrictions in the data it's able to backup:

Installation

To install the latest released version use npm:

npm install -g @cloudant/couchbackup

Requirements

Snapshots

The latest builds of the main branch are available on npm with the snapshot tag. Use the snapshot tag if you want to experiment with an unreleased fix or new function, but please note that snapshot versions are not supported.

Usage

Use either environment variables or command-line options to specify the URL of the CouchDB or Cloudant instance, and the database to work with.

The URL

To define the URL of the CouchDB instance set the COUCH_URL environment variable:

export COUCH_URL=http://localhost:5984

or

export COUCH_URL=https://myusername:mypassword@myhost.cloudant.com

Or use the --url command-line parameter.

When passing credentials in the user information subcomponent of the URL they must be percent encoded. Specifically, within either the username or password, the characters : / ? # [ ] @ % MUST be precent-encoded, other characters MAY be percent-encoded.

For example, for the username user123 and password colon:at@321:

https://user123:colon%3aat%40321@localhost:5984

Note take extra care to escape shell reserved characters when setting the environment variable or command-line parameter.

The Database name

To define the name of the database to backup or restore, set the COUCH_DATABASE environment variable:

export COUCH_DATABASE=animaldb

Or use the --db command-line parameter

Backup

To backup a database to a text file, use the couchbackup command, directing the output to a text file:

couchbackup > backup.txt

Another way of backing up is to set the COUCH_URL environment variable only and supply the database name on the command-line:

couchbackup --db animaldb > animaldb.txt

Logging & resuming backups

You may also create a log file which records the progress of the backup with the --log parameter, for example:

couchbackup --db animaldb --log animaldb.log > animaldb.txt

Use this log file to resume backups with --resume true:

couchbackup --db animaldb --log animaldb.log --resume true >> animaldb.txt

The --resume true option works for a backup that has finished spooling changes, but has not yet completed downloading all the necessary batches of documents. It is not an incremental backup solution.

You may also specify the name of the output file, rather than directing the backup data to stdout:

couchbackup --db animaldb --log animaldb.log --resume true --output animaldb.txt

Compatibility note

When using --resume use the same version of couchbackup that started the backup.

Restore

Now restore the backup text file to a new, empty, existing database using the couchrestore:

cat animaldb.txt | couchrestore

or specifying the database name on the command-line:

cat animaldb.txt | couchrestore --db animaldb2

Compatibility note

Do not use an older version of couchbackup to restore a backup created with a newer version.

Newer versions of couchbackup can restore backups created by older versions within the same major version.

Compressed backups

To compress the backup data before storing to disk pipe the contents through gzip:

couchbackup --db animaldb | gzip > animaldb.txt.gz

and restore the file with:

cat animaldb.tar.gz | gunzip | couchdbrestore --db animaldb2

Encrypted backups

Similarly to compression it is possible to pipe the backup content through an encryption or decryption utility. For example with openssl:

couchbackup --db animaldb | openssl aes-128-cbc -pass pass:12345 > encrypted_animal.db
openssl aes-128-cbc -d -in encrypted_animal.db -pass pass:12345 | couchrestore --db animaldb2

Note that the content is not encrypted in the backup tool before piping to the encryption utility.

What's in a backup file?

A backup file is a text file where each line is either a JSON object of backup metadata or a JSON array of backed up document revision objects, for example:

{"name":"@cloudant/couchbackup","version":"2.9.10","mode":"full"}
[{"_id": "1","a":1},{"_id": "2","a":2},...]
[{"_id": "501","a":501},{"_id": "502","a":502}]

The number of document revisions in a backup array varies. It typically has buffer_size elements, but may be more if there are also leaf revisions returned from the server or fewer if it is the last batch.

What's in a log file?

A log file has a line:

What's shallow mode?

When you run couchbackup with --mode shallow couchbackup performs a simpler backup. It only backs up the winning revisions and ignores any conflicting revisions. This is a faster, but less complete backup.

Note: The --log, --resume, and --parallelism are invalid for --mode shallow backups.

Why use CouchBackup?

The easiest way to backup a CouchDB database is to copy the ".couch" file. This is fine on a single-node instance, but when running multi-node Cloudant or using CouchDB 2.0 or greater, the ".couch" file only has a single shard of data. This utility allows simple backups of CouchDB or Cloudant database using the HTTP API.

This tool can script the backup of your databases. Move the backup and log files to cheap Object Storage so that you have copies of your precious data.

Options reference

Environment variables

Note: Environment variables are only used with the CLI. When using programmatically use the opts dictionary.

Command-line parameters

Using programmatically

You can use couchbackup programmatically. First install couchbackup into your project with npm install --save @cloudant/couchbackup. Then you can import the library into your code:

  const couchbackup = require('@cloudant/couchbackup');

The library exports two main functions:

  1. backup - backup from a database to a writable stream.
  2. restore - restore from a readable stream to an empty database.

Examples

See the examples folder for example scripts showing how to use the library.

Backup

The backup function takes a source database URL, a stream to write to, backup options and a callback for completion.

backup: function(srcUrl, targetStream, opts, callback) { /* ... */ }

The opts dictionary can contain values which map to a subset of the environment variables defined above. Those related to the source and target locations are not required.

When the backup completes or fails the callback functions gets called with the standard err, data parameters.

The backup function returns an event emitter. You can subscribe to:

Backup data to a stream:

couchbackup.backup(
  'https://examples.cloudant.com/animaldb',
  process.stdout,
  {parallelism: 2},
  function(err, data) {
    if (err) {
      console.error("Failed! " + err);
    } else {
      console.error("Success! " + data);
    }
  });

Or to a file:

couchbackup.backup(
  'https://examples.cloudant.com/animaldb',
  fs.createWriteStream(filename),
  {parallelism: 2},
  function(err, data) {
    if (err) {
      console.error("Failed! " + err);
    } else {
      console.error("Success! " + data);
    }
  });

Restore

The restore function takes a readable stream containing the data emitted by the backup function and uploads that to a Cloudant database.

Note: A target database must be a new and empty database.

restore: function(srcStream, targetUrl, opts, callback) { /* ... */ }

The opts dictionary can contain values which map to a subset of the environment variables defined above. Those related to the source and target locations are not required.

When the restore completes or fails the callback functions gets called with the standard err, data parameters.

The restore function returns an event emitter. You can subscribe to:

The srcStream for the restore is a backup file. In the case of an incomplete backup the file could be corrupt and in that case the restore emits a BackupFileJsonError.

Restore data from a stream:

couchbackup.restore(
  process.stdin,
  'https://examples.cloudant.com/new-animaldb',
  {parallelism: 2},
  function(err, data) {
    if (err) {
      console.error("Failed! " + err);
    } else {
      console.error("Success! " + data);
    }
  });

Or from a file:

couchbackup.restore(
  fs.createReadStream(filename),
  'https://examples.cloudant.com/new-animaldb',
  {parallelism: 2},
  function(err, data) {
    if (err) {
      console.error("Failed! " + err);
    } else {
      console.error("Success! " + data);
    }
  });

Error Handling

The couchbackup and couchrestore processes are able to tolerate many errors even over an unreliable network. Failed requests retry at least twice after a back-off delay. However, certain errors can't tolerate failures:

API

When using the library programmatically in the case of a fatal error the callback function gets called with null, error arguments.

CLI Exit Codes

On fatal errors, couchbackup and couchrestore exit with non-zero exit codes. This section details them.

common to both couchbackup and couchrestore

couchbackup

couchrestore

Note on attachments

TLDR; If you backup a database that has attachments without using the attachments option couchbackup can't restore it.

As documented above couchbackup does not support backing up or restoring databases containing documents with attachments.

The recommendation is to store attachments directly in an object store with a link in the JSON document instead of using the native attachment API.

With experimental attachments option

The attachments option is provided as-is and is not supported. This option is for Apache CouchDB only and is experimental. Do not use this option with IBM Cloudant backups.

Without experimental attachments option

Backing up a database that includes documents with attachments appears to complete successfully. However, the attachment content is not downloaded and the backup file contains attachment metadata. So attempts to restore the backup result in errors because the attachment metadata references attachments that are not present in the restored database.