Database connection fail when in different network

jannikw commented 4 years ago

There is an issue when a database that should be backed up is running in a different docker network than the backup container. This problem mostly arises in multi project setups using docker compose, but could also be reproduced by explicitly assigning the networks in that way. When both the backup container and the database container are not in the same network, first rcb status fails to execute the "ping" and dumping the database contents will also fail.

There are a few possible solutions I can think of:

Add the backup container to the network of all database containers.
Retrieve the networks of the database container the docker network api to temporarily add the backup container to the database container's network and then remove it again when the work is done.
Use the method exec_run(...) from the docker container api to execute a command inside the database container instead of executing pg_isready, mysqladmin etc. inside the backup container using the python subprocess api.

I think solution number 1 isn't a good idea, because it would break the network isolation that docker networks provide and would require manual configuration by the user. Number 2 is possible, but would mean having to always retrieve the correct networks, adding them and cleaning them up later even for just executing rcb status. Number 3 is probably the best option. I already worked on changing database pinging to use exec_run as a proof of concept and it works. This options would also allow to remove the dependencies on the database client packages from the backup container.

Let me know what you think @einarf :)

einarf commented 4 years ago

I've been thinking about this, but I'm not entirely sure what the best solution is. Just having to add the networks manually to the container was the simplest solution and the spawned containers can use the following network mode to automatically get the same network setup : https://github.com/ZettaIO/restic-compose-backup/blob/5c33ccf0b1dd0f1213a79a82765596290ce4263f/src/restic_compose_backup/backup_runner.py#L22

My main worry about using exec for streaming dumps into restic is that it might affect performance, but I have never tried this properly. Could there also be cases were tools other than the standard ones are used? What other impact does streaming database dumps from the database containers themselves have? There might be cpu/memory constraints on these containers for example.

Maybe it should be possible to support all options leaving that to each "backup type" entirely? I'm just thinking ahead were support for more things are potentially added over time and how users could extend the system to support their custom services (outside of just volume and database backups)

jannikw commented 4 years ago

I've been thinking about this, but I'm not entirely sure what the best solution is. Just having to add the networks manually to the container was the simplest solution and the spawned containers can use the following network mode to automatically get the same network setup : Yes, it is simple, but also requires configuration changes in the docker-compose.ymls besides adding labels to the containers. This works against having this tool work out of the box, which would be ideal. Also having all databases in one network is really undesirable, because there is no reason for it from an application view, it is just required for the backups to work.

I also noticed another problem that is a result of executing pg_dump inside the backup container: See #30.

My main worry about using exec for streaming dumps into restic is that it might affect performance, but I have never tried this properly.

Sadly I only have some small databases to test performance with, but for those I didn't really see a difference. We are probably trading one kind of internal network traffic for the other. pg_dump or mysqladmin use some database specific protocol while streaming the stdout goes over the docker api, maybe be pg_dump or mysqladmin are a bit more efficient here since they can use some binary protocol instead of transporting stdout in plaintext.

Could there also be cases were tools other than the standard ones are used?

Are you referring to potential version incompatibilities between the tools we expect in the container and the tools actually available? The interface for pg_dump seems to be designed with compatibility in mind, so I don't think this is an issue since many database admins will rely on scripts using those tools and as long as the official docker images for the three databases are supported, I think we should be fine.

What other impact does streaming database dumps from the database containers themselves have? There might be cpu/memory constraints on these containers for example.

Of course there will be a cpu increase when doing the database dump, but this is to be expected and cannot be avoided. The tools like pg_dump connect to the database no matter what, so the only increase in cpu/memory usage should be the additional process of the backup tool. But they don't do much processing so I expect this to be quit low and since backups are usually scheduled at a quiet time I don't expect this to be a problem overall.

Maybe it should be possible to support all options leaving that to each "backup type" entirely? I'm just thinking ahead were support for more things are potentially added over time and how users could extend the system to support their custom services (outside of just volume and database backups)

Best would be some kind of plugin architecture I think. This would provide the most flexibility. For example using these interfaces:

Plugin
- check_applicable(container) -> bool: Checks whether the plugin is applicable for the given container by looking at labels or other metadata
- create_providers(container) -> List[BackupProvider]: Supplies a list of provider objects that can later execute the backups
BackupProvider
- is_ready() -> bool: Perform something like database pinging or always return True
- backup(): Execute the actual backup

The question is whether simple volume backups would be considered a plugin as well. If so, either this plugin abstraction would need a way to communicate requirements for the container backups are run within (-> volume mounts) or backup() would itself spin up a new container just for the backup. Also this would create restic snapshots for each volume mount path and not just a single /volumes which I wouldn't even consider bad.

einarf commented 4 years ago

Sorry for slow response. I'm going to try out dumping a larger database thought exec and see. I'm trying to wrap my head around all the advantages and disadvantages using exec vs network. I was also hoping to get this somewhat working in kubernetes at some point. It gets a bit complex.

jannikw commented 4 years ago

No worries, I have been a bit busy myself, but I would like to get this tool working since it fits my usecase very well. :)

ZettaIO / restic-compose-backup

Database connection fail when in different network #28