delta, n.: a landform created by sediment flowing downstream
Deltaic is a tool for backing up data from several types of data sources into a filesystem. Historical data is maintained by taking an LVM thin-provisioned snapshot of the backup filesystem after each backup. Backed-up data is periodically checked against the original for discrepancies.
These data sources are currently supported:
Deltaic is released under the GNU General Public License, version 2.
Some of the packaging / linting related configuration files that are directly derived from example configurations retain their original MIT and zero clause BSD licenses.
user_xattr
mount option for ext4) and discard support (discard
for ext4).
Note than on RHEL 6, ext4 filesystems are limited to 16 TB. Mounting
noatime
is recommended.These are the requirements for just the base system. Individual data sources have additional requirements; see below.
Create a user account to run backups.
Create an LVM volume group, a thin-provisioning pool, and a thin-provisioned backup volume within it.
LVM limits the size of the thin-pool metadata volume to 16 GiB. This volume requires 64 bytes of capacity per data chunk, plus additional overhead for volume snapshots. Configure a 16 GiB metadata volume and a chunk size consistent with the expected maximum size of your pool. For example, for a 200 TiB pool, a chunk size of 2 MiB seems reasonable:
$ thin_metadata_size -b 2m -s 200t -m 1000 -u g
thin_metadata_size - 3.18 gibibytes estimated metadata area size for "--block-size=2mebibytes --pool-size=200tebibytes --max-thins=1000"
So, for example, if your backup device is /dev/md0
and your user
account is user
:
pvcreate /dev/md0
vgcreate backups /dev/md0
lvcreate -l 100%FREE -T backups/pool --chunksize 2m --poolmetadatasize 16g
lvcreate -V200t -T backups/pool -n current
mkfs.ext4 -m 0 -E lazy_itable_init /dev/backups/current
mkdir /srv/backup
mount /dev/backups/current /srv/backup
chown user.user /srv/backup
chmod 700 /srv/backup
echo "/dev/backups/current /srv/backup ext4 user_xattr,discard,noatime 0 2" >> /etc/fstab
Create a config file in ~/.config/deltaic.conf
. See
example-config.yaml for the available settings.
Here is a minimal config file for the above configuration:
settings:
root: /srv/backup
backup-lv: backups/current
From the backup user account:
virtualenv env
env/bin/pip install git+https://github.com/cmusatyalab/deltaic
env/bin/deltaic mkconf sudoers | sudo tee /etc/sudoers.d/deltaic
env/bin/deltaic mkconf crontab --email your@email.address | crontab
Set up data sources by configuring the system as described below and
then making the proper entries in
~/.config/deltaic.conf
.
coda
: The Coda distributed filesystemcoda
backs up individual replicas of Coda volumes by connecting to a
Coda server via SSH and running volutil dump
and codadump2tar
.
File/directory UIDs and mode bits are stored as rsync-compatible extended
attributes. ACLs are not stored.
For performance, coda
takes incremental volume dumps when possible.
However, incremental dumps do not record deleted files, so in each backup
run, coda
will perform full dumps on a random subset of volumes (1 in 7
by default).
authorized_keys
access to root on the Coda serversAuthenticate a Coda client on a remote host, then run:
deltaic rsync restore --coda -u <remote-unix-user> <coda-backup-dir> <remote-host> <destination-coda-filesystem-path>
This command runs rsync with the necessary options to restore all
backed-up metadata. Additional options specified to the restore command
(such as --delete
) will be forwarded to rsync.
github
: GitHub organizationsgithub
backs up GitHub teams, repositories, wikis, issues, milestones,
commit comments, release metadata, and release assets for all accessible
repositories within configured GitHub organizations. Repositories and wikis
are stored as bare Git repositories, release assets as individual files, and
all other metadata in JSON format.
deltaic github auth
Repository and wiki data can be restored with git push
. Restore tools for
other metadata have not yet been written. (Note that it is impossible to
restore all data types to GitHub in their original form. Creator
information for e.g. individual issue comments cannot be restored without
access to the creators' GitHub accounts, and timestamps cannot be restored
at all.)
rbd
: Ceph's block device layerrbd
backs up RADOS Block Device images and snapshots into large image
files. It backs up RBD images by creating an RBD snapshot, fetching its data
with rbd export-diff
, and applying the resulting patch to the backup image
file. It backs up RBD snapshots by fetching their data with rbd export-diff
.
RBD images are backed up incrementally, but snapshots are not: if a snapshot
is recreated in RBD, during the next backup it will be retrieved again from
scratch. rbd
supports multiple RADOS pools.
Any image backed up by rbd
will have one or more snapshots that have been
created without the knowledge of the higher-level system (such as OpenStack
Cinder) that uses the image. As a result, attempts to delete the image
through the higher-level system will fail. Before deleting such an image,
remove its backup snapshots with deltaic rbd drop
.
rbd
commandsUse deltaic rbd restore
to restore a backup image or snapshot to a new
RBD image.
rgw
: Ceph's S3-compatible storage layerrgw
backs up RADOS Gateway buckets using the S3 web API. It accesses
buckets using the credentials of the bucket owner, which it obtains using
radosgw-admin
.
rgw
backs up object data, metadata, and ACLs, and bucket ACLs and CORS
configuration. If an object has not changed since the last backup, rgw
assumes that its ACLs have not changed either, except during a periodic
scrub. If manually backing up an individual bucket, pass --scrub-acls
to
deltaic rgw backup
to override this assumption and re-download ACLs for
all objects.
radosgw-admin
commandsrgw_defer_to_bucket_acls
config option on Ceph 0.73 or above.Use deltaic rgw restore
to restore a bucket. The bucket owner, and all
users mentioned in the bucket and object ACLs, must already exist in
radosgw. Last-Modified and ETag headers are not restored, since the
S3 API does not permit it.
rsync
: Remote filesystemsrsync
backs up remote filesystems using the file synchronization program
of the same name. File/directory ownership, modes, ACLs, etc. are stored
as extended attributes using the rsync --fake-super
option.
authorized_keys
access to root on the remote hostsUse deltaic rsync restore
to restore a file or directory from the backup
filesystem to a destination host. This command runs rsync with the
necessary options to restore all backed-up metadata. Additional options
specified to the restore command (such as --delete
) will be forwarded to
rsync.
Deltaic can archive a snapshot into an archive set stored in off-server
(presumably offsite) storage. Each unit (rbd
image, rgw
bucket, rsync
machine, etc.) is separately packaged into a tarball, optionally encrypted
with GPG, and uploaded to a remote archive.
Create a filesystem (perhaps backed by the thin pool) for temporary data. For example:
lvcreate -V20t -T backups/pool -n spool
mkfs.ext4 -m 0 -E lazy_itable_init /dev/backups/spool
mkdir /srv/spool
mount /dev/backups/spool /srv/spool
chown user.user /srv/spool
chmod 700 /srv/spool
echo "/dev/backups/spool /srv/spool ext4 discard,noatime 0 2" >> /etc/fstab
If archives should be encrypted, use gpg2 --gen-key
to generate one
GPG signing key and one or more additional GPG keys for encryption. The
signing key is used during archive creation and must not require a
passphrase. Any of the encryption keys can be used during restore to
decrypt the archives; they should be protected with a passphrase.
Retain copies of all the keys offline in a safe place.
Configure archive settings in deltaic.conf
. See
example-config.yaml for the available settings.
Here is an example config fragment for storing encrypted archives to AWS:
settings:
archive-spool: /srv/spool
archive-gpg-signing-key: [signing key fingerprint]
archive-gpg-recipients:
- [encryption key fingerprint]
archivers:
default:
archiver: aws
aws-access-key-id: [access key ID]
aws-secret-access-key: [secret access key]
aws-region: us-east-1
aws-namespace: [name of vault and database]
aws-storage-cost: [dollars per GB-month in aws-region, if not 0.01]
Enable the commented-out cron jobs in the crontab created by
deltaic mkconf crontab
and configure their schedules as desired.
Restoring data requires two separate commands: deltaic archive retrieve
to
fetch an archive and deltaic archive unpack
to unpack it. The unpack
destination directory can be the backup root or a different directory.
It is possible to retrieve and unpack archives whose units are not listed in the config file. This makes it possible to perform disaster recovery using a reconstructed config file containing only critical system-wide configuration such as GPG keys and AWS credentials.
When setting up a new Deltaic installation for disaster recovery, ensure that you do not enable pruning of old archive sets before retrieving all the archives you need.
aws
: Amazon Web Servicesaws
stores archives in Amazon Glacier and archive metadata in Amazon
SimpleDB. To avoid Glacier early-deletion penalties, archive sets are
protected from pruning for the first 90 days after uploading.
Amazon Glacier bills for retrievals based on the maximum hourly bandwidth
used during a calendar month. Therefore, retrieving a large amount of data
in a short time can be very expensive. The most cost-effective way to
schedule your retrievals is to retrieve at a consistent rate over the course
of a month. You can specify a retrieval rate with the -r
option to
deltaic archive retrieve
; the default is 1 GiB/hour. Use
deltaic archive cost
to help select a retrieval rate.
googledrive
: Google Drivegoogledrive
stores archives and archive metadata in Google Drive.
To use Google Drive you first need to create or select a project in the Google Developers Console and enable the Drive API.
Go to the Google Developers Console.
Select existing or create a new project.
Under APIs & Auth, select APIs -> Drive API, enable API.
Under APIs & Auth, select Credentials -> Create new Client ID
Copy the Client ID and Client secret to deltaic.conf
.
googledrive-client-id: [Client ID]
googledrive-client-secret: [Client secret]
archivers:
default:
archiver: googledrive
Run deltaic googledrive auth ~/.cache/deltaic-googledrive-credentials.json
, and follow the prompts to obtain an authorization code. You can also use the 'googledrive-credentials-file' configuration option to choose a different name/location of the file.
To test if the authentication tokens are working run deltaic googledrive test
.