Multivolume independent backups

dicastro commented 6 years ago

This pull request refers to my comment in #29

It allows to have backups of multiple volumes stored independently. I have followed the idea of having multiple variables to indicate multiple sources, targets and caches, you exposed in one comment.

docker run -it --rm \
    --name volumerize \
    -v <volume_server_data>:/sourceserver:ro \
    -v <volume_db_data>:/sourcedb:ro \
    -v <volume_backup_server>:/backupserver \
    -v <volume_backup_db>:/backupdb \
    -v <cache_volume_server>:/volumerize-cache-server \
    -v <cache_volume_db>:/volumerize-cache-db \
    -e "VOLUMERIZE_SOURCE_SERVER=/sourceserver" \
    -e "VOLUMERIZE_TARGET_SERVER=file:///backupserver" \
    -e "VOLUMERIZE_CACHE_SERVER=/volumerize-cache-server" \
    -e "VOLUMERIZE_SOURCE_DB=/sourcedb" \
    -e "VOLUMERIZE_TARGET_DB=file:///backupdb" \
    -e "VOLUMERIZE_CACHE_DB=/volumerize-cache-db" \
    blacklabelops/volumerize backup

This way, by using the variable VOLUMERIZE_CONTAINERS, is required only one container of volumerize to have consistent and independent backups of multiple volumes. The workflow would be:

stop containers > backup volume 1 > backup volume N > start containers

For me, this way is far more simple than having multiple instances of volumerize and having scripts to coordinate stopping all containers, backuping and restarting them.

blacklabelops commented 6 years ago

Still not sure if I want this kind of complexity in this image. It does not contribute more value but makes configuration and documentation much more complex.

dicastro commented 6 years ago

I don't see why this change increases the complexity. The image still can be used as usually:

docker run -it --rm \
    --name volumerize \
    -v <volume_source>:/source:ro \
    -v <volume_backup>:/backup \
    -e "VOLUMERIZE_SOURCE=/source" \
    -e "VOLUMERIZE_TARGET=file:///backup" \
    blacklabelops/volumerize backup

But it has a more advanced use for multivolume backups. You can define multiple SOURCE, TARGET and CACHE environment variables, and they are correlated by a common suffix in their name.

docker run -it --rm \
    --name volumerize \
    -v <volume_server_data>:/sourceserver:ro \
    -v <volume_db_data>:/sourcedb:ro \
    -v <volume_backup_server>:/backupserver \
    -v <volume_backup_db>:/backupdb \
    -v <cache_volume_server>:/volumerize-cache-server \
    -v <cache_volume_db>:/volumerize-cache-db \
    -e "VOLUMERIZE_SOURCE<FREE_SUFFIX_1>=/sourceserver" \
    -e "VOLUMERIZE_TARGET<FREE_SUFFIX_1>=file:///backupserver" \
    -e "VOLUMERIZE_CACHE<FREE_SUFFIX_1>=/volumerize-cache-server" \
    -e "VOLUMERIZE_SOURCE<FREE_SUFFIX_2>=/sourcedb" \
    -e "VOLUMERIZE_TARGET<FREE_SUFFIX_2>=file:///backupdb" \
    -e "VOLUMERIZE_CACHE<FREE_SUFFIX_2>=/volumerize-cache-db" \
    blacklabelops/volumerize backup

For me this change would simplify things a lot. In my company we have several systems and each system is composed by multiple containers, and more than one container has data to be backup.

What I am trying to do (my current solution still doesn't work) is to have one instance of volumerize per container with data (as commented in some issues). And I have an external bash script (one per system) to run one instant volumerize backup per container with data in the system. Besides, this script has to be included in crontab to be scheduled periodically.

With this change, instead of having an external script, per system, orchestrating multiple instant backups and scheduled in crontab, we would have only one instance of volumerize per system, which does everything: stops related containers, makes all backups, restarts containers and it's already scheduled to be run periodically.

Are you open to discuss or you have really clear that you don't want this?

blacklabelops commented 6 years ago

I want to discuss this not solely discard it. It feels like I do not understand the benefits or would solve it differentially.

In my context: System -> One Docker demon. (Or a swarm where I access containers from a single master) Container -> Stack (You may call this also composition or any other name from any technology that assembles containers)

Applications running on my System (Configured and managed as stacks):

Stacks:

2x Jenkins
4x Atlassian Products (Jira, Confluence, Crowd, Bitbucket)
Letsencrypt Reverse Proxy

=> It would be a hassle to manual configure cron jobs and backups for all of these applications on a single container. I am doing this "stack-wise".

Advantages of stacks:

Cron containers with cron jobs solely for the stack
Backup containers with cron jobs and database dump scripts solely for the stack
Management is done on stack layer, not system layer -> Start, stop, , deploy, update Stacks
Stacks are movable between systems -> I have migrated already between 3 service providers.

Backups:

One or more Volumerize per stack
Volumerize configuration for multiple volumes:
- If backups should be synchron: https://github.com/blacklabelops/jira#upgrading-jira
- Otherwise multiple running containers.

How do I do system wide backups?

A Bash script lists all Volumerize instances -> e.g. identified by tags, container name oder underlying image.
Bash script triggers parallel or sequential backup procedure on discovered images.

blacklabelops commented 6 years ago

I have one further question:

Is this about the start and stopping routine? You need a fixed set of containers that needs to be stopped and restarted but the backups themselves have different locations or backup procedures?

abate commented 6 years ago

Another use case for this patch is when the backups are encrypted. If you want to restore just one container, you are forced to download the entire encrypted archive. Imagine you have a stack with a volume for file storage (20Gb) , and one for the database (10Mb). If you choose to put everything in one backup, to restore the database, you are still forced to download have the space for 20Gb of data. Having independents backups in this case make sense.

blacklabelops commented 6 years ago

Multiple passwords, encryption certificates or duplicity options are not covered by this pull request.

blacklabelops commented 6 years ago

Before we continue I will have to refactor the current scripts. Afterwards we can discuss this topic.

blacklabelops / volumerize

Multivolume independent backups #51