eXist-db / docker-existdb

Docker image builder for eXist-db
GNU Affero General Public License v3.0
11 stars 6 forks source link

how to go about backup and restore #40

Open duncdrum opened 5 years ago

duncdrum commented 5 years ago

I see a number of ways we might go about baking backup and restore into the images. As this is pretty much the last big feature I'd like to add, I'm curious to hear opinions on which way to go.

Current situation: User is on :release which is exist-db 4.x, once 5.x (a binary incompatible) major upgrade is out, just running: docker pull existdb/existdb:release will create a broken instance.

Ideally, I would like it to trigger a backup and restore

we could:

this would go along with setting a restart_policy, rollback_config, and update_config and updating the docker-compose file version to 3.7.

In either scenario, backups should happen to their own volume, so that one is a given in my mind.

adamretter commented 5 years ago

@duncdrum I think this should be the users responsibility.

grantmacken commented 5 years ago

@adamretter

I think this should be the users responsibility.

I agree, however should attempt to document, how to carry out tasks that are specific to eXist running in a container environment and doing a backup is most likely one of those tasks. @duncdrum

backups should happen to their own volume

why not just backup to the '/tmp' dir then 'docker cp' the backup files into your host and do the reverse when doing a restore.

Another alternate method comes to mind

docker exec ex java -jar tools/ant/lib/ant-launcher-1.10.2.jar -version

I think, the distroless uses JRE, so to run ant tasks we will need to add tools.jar from the JDK

duncdrum commented 5 years ago

why not just backup to the '/tmp' dir then 'docker cp' the backup files into your host and do the reverse when doing a restore.

it's a good idea, but copying to local drive goes against all my docker instincts (shrug), exist should already ships with an ant.jar so i think we should be able to call that without adding tools to the gcr image

grantmacken commented 5 years ago

exist should already ships with an ant.jar

However, as I mentioned, ant depends on tools.jar and it complains if it can't be found

  1. the builder target uses JDK so tools.jar is available
    1. the final target uses JRE so tools.jar won't be found by ant

So either we add, tools.jar from the JDK, in this repo or (better) the eXist repo includes it as part of their build dependencies.

grantmacken commented 5 years ago

closed by mistake

duncdrum commented 5 years ago

@adamretter since we only require JRE for exist-db if the ant.jar needs a jdk thingy I'd say we should indeed ship with tools.jar

dizzzz commented 5 years ago

Shipping tools.jar only is probably not allowed, license wise. Additionally there might additional technical consequences. So I'd recommend to install the whole JDK.

adamretter commented 5 years ago

We should not ship tools.jar like that.

I think we need to take a step back and think about the fundamentals here! We are being blinkered by Docker. Docker needs to work for us, not us working for Docker ;-)

The purpose of a backup is so that a user can get a full copy of their data and then move it to some backup media, in the past this was probably tape or CD-ROM but these days is likely a network share on a different machine.

Two sensible options that I see:

  1. The user initiates the backup from the host machine by using backup.sh on the host, and gives the URL of eXist-db running in the Docker container.

  2. The option that @grantmacken suggested. The user initiates the backup from the Docker container to /tmp or somewhere ephemeral, and then they use docker cp to get it to the host.

duncdrum commented 5 years ago

re the tools.jar the not backup specific upshot of our discussion seems to be that ant task can't be run with our docker images, we need to a) document this, b) remove the ant jar from the image if it is just dead weight.

re backup to /tmp nobody is working for docker, but in line with good practices for limiting interaction between the running container and the host vm it's not that simple. I have no idea what happens if on beanstalk (or its azure and google counterparts) you try to access /tmp on host, i very much doubt you simply can. So we should choose an example that works in those use-cases as well as in local dev testing.

adamretter commented 5 years ago

if on beanstalk (or its azure and google counterparts) you try to access /tmp on host

I was talking about /tmp on the guest (i.e. in the container) not the host.

duncdrum commented 5 years ago

ok that makes more sense, so here is my latest take on our discussion. i m in favour of triggering server side backups inside a running container, since the chances that other processes might interfere with a client side backup in multi-container environments are pretty high. This has the added advantage that folks can just use the UI.

Instead of depending on a specific path existing in the bases gcr image like /tmp we should use the regular default path which we are generating anyway webapp/WEB-INF/data/export/

The readme gets a line for how to trigger a server side backup for a given container along the lines off:

docker exec exist java -jar start.jar client --no-gui --xpath "system:export('/export/backups',0,0)"
docker cp exist:exist-data/export/backups .

followed by:

docker cp .  exist:exist-data/export/backups
docker exec exist java -jar start.jar client --no-gui --xpath "system:restore('/export/backups', '', '')"

I ll see about the repair functions and take some screens. Anything speaking against adding the backup location and system calls to the compose file?