docker volume save / load commands

mkarg commented 6 years ago

[ ] This is a bug report
[X] This is a feature request
[X] I searched existing issues before opening this one

Expected behavior

docker volume save [OPTIONS] VOLUME [VOLUME...]

Save one or more volumes to a tar archive (streamed to STDOUT by default)

docker volume load

Load a volume from a tar archive or STDIN

Using a combination of both commands allows to migrate containers across driver-types and storage locations in a transparent way.

Actual behavior

There are no existing commands to backup and restore volumes independent of the volume's driver type. To backup and restore, one currently has to break the type-agnostic view and apply type-specific commands and precedures.

Steps to reproduce the behavior

E. g. attempt to migrate a data volume currently backed by an on-premise vSphere Docker Volume Service to AWS cloud storage without applying any particular vSphere or AWS tooling.

Output of docker version:

17.06.0-ce

Additional environment details (AWS, VirtualBox, physical, etc.)

Proposed solution is agnostic of any environment details.

cpuguy83 commented 6 years ago

I'm pretty 👎 to features like this. There are many great tools for transmitting data from one system to another all of which can be run in a container.

While the feature is convenient to have, it really shouldn't replace other really good tools out there.

mkarg commented 6 years ago

There are scenarios that you cannot execute without some support built into the docker volume command itself:

Creator and receiver of the tar archive must use compatible compression. In case of external tooling, it is possible that creator and receiver use incompatible compression types, as tar -I allows me to even invoke my own self-invented and totally unknown compression type. The receiver wouldn't have any chance to restore the backup!
A backup script containing docker volume save would be platform agnostic, hence works on Linux and Windows unmodified. As the tool tar does not exist on Windows (while there are tools able to read and write .tar files), one has to setup different solutions per platform.
Containers only have access to the files stored in volumes, but not to any kind of meta-data for the volume create by docker itself or by the particular volume driver. Hence, such meta-data will get lost. The docker volume safe command on the other hand sees the volume comprehensively, hence would be able to copy / transfer / migrate such meta-information, too.
Containers to not see who is using the volume currently besides itself. In case special treatment of the volume is needed before or after a transfer / migration, such a solution might fail. As docker volume save talks to the volume drivers, these drivers could apply the necessary preparation / postprocessing. Also docker itself could chime in: For example, execution of sync, or pausing / unpausing containers to guarantee a consistent file system view inside the volume.

Besides that, for beginners it is much easier to explain a default backup solution that will work always, anywhere, platform-independent: "Run docker volume save.". So while it is also about convenience, my proposal is more about solving the bullets above.

cpuguy83 commented 6 years ago

Creator and receiver of the tar archive must use compatible compression

This sounds like a process problem and also doesn't fix the problem in that the data could very well be compressed by the person saving it using whatever format they choose. One could even write a tool that wraps docker to export the data in exactly the way you want (an old example of this... https://github.com/cpuguy83/docker-volumes)

Containers only have access to the files stored in volumes

But why is this metadata important across hosts? If the driver itself is multi-host, then the metadata will be available anyway (except for labels which are currently local only in moby/moby). I wouldn't expect a save/load for volumes to save metadata like this as the metadata is very likely host specific otherwise the data would be on a multi-host driver anyway.

pausing / unpausing containers

This would be an abstraction leak. This should be up to the caller to implement.

these drivers could apply the necessary preparation / postprocessing

While this is true, and I'm generally in favor of an API for implementing snapshot and clone functionality this is not the same as exporting data out of the system. It also requires new API's for interacting with drivers.

for beginners it is much easier to explain a default backup solution

Here lies my main concern with such functionality. Making a new feature that only applies to beginners is a dangerous path that will most definitely lead to issues down the road about such functionality not scaling. A simple copy of data is also not generally a good backup solution, and would most certainly fail outright for certain scenarios (e.g. a database).

mkarg commented 6 years ago

Creator and receiver of the tar archive must use compatible compression

This sounds like a process problem and also doesn't fix the problem in that the data could very well be compressed by the person saving it using whatever format they choose. One could even write a tool that wraps docker to export the data in exactly the way you want (an old example of this... https://github.com/cpuguy83/docker-volumes)

My proposal aims in the introduction of a standardized unique docker volume file format, not only on solving particular process cases. In case you like to publish data to allow anonymous use, you even cannot run a negotiation of format or compression. Certainly third parties could write external cross-platform tools, but with this argument you could also vote for remote the existing docker image save command, too.

Containers only have access to the files stored in volumes

But why is this metadata important across hosts? If the driver itself is multi-host, then the metadata will be available anyway (except for labels which are currently local only in moby/moby). I wouldn't expect a save/load for volumes to save metadata like this as the metadata is very likely host specific otherwise the data would be on a multi-host driver anyway.

Example A: Assume the case that a volume is created using volume driver A (e. g. vSphere). That driver might add important information into the volume's meta data. Then you migrate it to AWS. From AWS to local. ...etc... then eventually back to vSphere. Keeping the meta-data of vSphere might be beneficial here as one does not have to reconfigure manually.

Example B: In future docker, or a docker extension, possibly might store additional cross-driver volume information in the volume's meta-data. For example, a copyright note allowing public use of analytic data found in this published volume. For legal reasons, it might be necessary to keep this information. For technical reasons, like a docker extension, it simply might be beneficial. For example, a backup tool might store information of the date of the backup to get rid of centralized management records.

pausing / unpausing containers

This would be an abstraction leak. This should be up to the caller to implement.

I'd rather say that a command like docker volume save --quiescence=all-containers would be a huge benefit instead of an abstraction leak.

these drivers could apply the necessary preparation / postprocessing

While this is true, and I'm generally in favor of an API for implementing snapshot and clone functionality this is not the same as exporting data out of the system. It also requires new API's for interacting with drivers.

My proposal does not talk about importing and exporting, it solely talks about two new commands. What the user does with it, is a different case. I just added this explanation to make clear why such a command is needed. Yes, new API is needed for the drivers, but I though that proposals are good for... well... proposing new things. ;-)

for beginners it is much easier to explain a default backup solution

Here lies my main concern with such functionality. Making a new feature that only applies to beginners is a dangerous path that will most definitely lead to issues down the road about such functionality not scaling. A simple copy of data is also not generally a good backup solution, and would most certainly fail outright for certain scenarios (e.g. a database).

Actually I gave four bullets, of which at least one you accepted to be true. So I do not see that it is good "only for beginners". Also I simply cannot see why docker volume safe should "not scale", while no such concerns exists with docker image save.

I never asked for a simple backup of data: I also proposed to include meta-data, and you already mentioned that a new API is needed to do the proposed things beyond the simple data copy.

mkarg commented 6 years ago

I think this discussion came to a point where someone of the Docker core team would respond whether it makes any sense to develop and publish a pull request for the proposed feature. I mean, a statement of official interest, independend of personal opinion.

docker / for-linux