Upgrading between major versions?

roosmaa commented 9 years ago

There doesn't seem to be a good way to upgrade between major versions of postgres. When sharing the volume with a new container with a newer version of postgres it won't run as the data directory hasn't been upgraded. pg_upgrade on the other hand requires (?) old installation binary files, so upgrading the data files from new server container is also difficult.

It would be nice if there was some suggested way of doing this in the readme. Maybe even some meta container which does the upgrading from version to version?

md5 commented 9 years ago

The "Usage" section of this document describes what pg_upgrade does: http://www.postgresql.org/docs/9.4/static/pgupgrade.html

roosmaa commented 9 years ago

I guess my concern does not communicate well in the first comment... Lets try again.

pg_upgrade functionality is perfectly clear: you need to have both postgres [old] and [new] versions installed at the same time to upgrade the data structures on disk. But what is not clear is the best way of going about the upgrade when using containers, since every container only has a single version of postgres available/installed.

Should one create a custom upgrade-from-X-to-Y container with both X and Y versions of postgres installed?
Or should one just apt-get install new postgres version into the existing container, upgrade the data and then replace the container with one of the "vanilla" containers?
Or some even more clever way of doing this?

It would be fairly useful to have a section in the Readme regarding how to approach this. (And what to avoid?)

tianon commented 9 years ago

I've been wondering about/looking out for a good way to handle this myself. If someone's got a good flow that works well for them, I'm very interested in getting something documented (and "transition images" added if that's what's necessary here -- this is a common enough thing IMO that it doesn't really make sense to have everyone create their own for their one-off needs).

Perhaps for each "supported" version, we could add a variant that also installs the latest major release, and then we document how to create your own one-off transition container if you need a transition more esoteric than that?

md5 commented 9 years ago

@roosmaa: my comment wasn't so much directed at you as intended as background.

kajmagnus commented 9 years ago

+1 for major version upgrade instructions.

Yet another alternative, for smaller databases (?), should be that the old container runs pg_dumpall and saves the dump somewhere in the volume. Then one stops the container, starts a new one with the new major version, and it imports the dump.

mpe commented 9 years ago

Yeah this is a major pain ... at the moment. Would be great to get some official way to do it.

belak commented 9 years ago

Yeah, just nuked my postgres install on accident with an upgrade. Are there any plans for this?

mkarg commented 8 years ago

I am not a docker guru, but I could imagine that it is possible to add some script to the container that is started when the container starts. That script should check the database files, and when it detects the correct version, simply startup the postgresql engine. If not, it should decide to run pgupgrade or dump/reload. Shouldn't be rocket science?

tianon commented 8 years ago

The main problem as I understand it is that we need both the old version and the new version of the postgres binaries simultaneously in the same container (otherwise you can't pgdump the old data).

dcbishop commented 8 years ago

+1.

I'm just going to nuke what I have since it's just a personal server I don't really use, but this seems like a major pain maintenance wise.

Dirbaio commented 8 years ago

I've had success upgrading launching another postgres instance with the new version, and then using dumpall and psql to move all the data piping:

docker exec postgres-old pg_dumpall -U postgres | docker exec -i postgres-new psql -U postgres

It's rather simple :)

pg_upgrade is supposed to be faster though, I'm interested if someone has an easy way of using it.

asprega commented 8 years ago

I was attempting to hack up a bash script to manage the upgrade process between two containers of different versions by using pg_upgrade, but I hit a roadblock:

#!/bin/bash

#
# Script to migrate PostgreSQL data from one version to another
#

set -e

OLD_CONTAINER=$1
OLD_VOLUME=$2

NEW_CONTAINER=$3
NEW_VOLUME=$4

if [ -z "$OLD_CONTAINER" ] || [ -z "$OLD_VOLUME" ] || [ -z "$NEW_CONTAINER" ] || [ -z "$NEW_VOLUME" ]; then
  echo -e
  echo -e "Usage: ./pg_upgrade_docker.sh [old container name] [old volume name] [new container name] [new volume name]"
  echo -e "Example: ./pg_upgrade_docker.sh postgres94 postgres94_data postgres95 postgres95_data"
  echo -e
  exit 1;
fi

# Get the major version and the pg binaries out of the old postgres container
(docker start "$OLD_CONTAINER" || true) > /dev/null
OLD_MAJOR="$(docker exec "$OLD_CONTAINER" bash -c 'echo "$PG_MAJOR"')"

OLD_BIN_DIR="/tmp/pg_upgrade/bin/$OLD_MAJOR"
mkdir -p "$OLD_BIN_DIR"
rm -rf "$OLD_BIN_DIR"/*
docker cp "$OLD_CONTAINER":"/usr/lib/postgresql/$OLD_MAJOR/bin/." "$OLD_BIN_DIR"

(docker stop "$OLD_CONTAINER" || true) > /dev/null

# Get the major version out of the new postgres container
(docker start "$NEW_CONTAINER" || true) > /dev/null
NEW_MAJOR="$(docker exec "$NEW_CONTAINER" bash -c 'echo "$PG_MAJOR"')"
(docker stop "$NEW_CONTAINER" || true) > /dev/null

# Create a temp container running the new postgres version which we'll just use to migrate data from one volume to another.
# This container will use the old binaries we just extracted from the old container.
# We can't reuse the existing "new" container because we have to bind extra volumes for the update to work.
NEW_IMAGE="$(docker ps -a --filter "name=$NEW_CONTAINER" --format "{{.Image}}")"
docker run -v "$OLD_BIN_DIR":/tmp/old-pg-bin -v "$OLD_VOLUME":/tmp/old-pg-data -v "$NEW_VOLUME":/tmp/new-pg-data \
  --name temp_postgres_util "$NEW_IMAGE" su - postgres -c "cd /tmp && /usr/lib/postgresql/$NEW_MAJOR/bin/pg_upgrade \
    -b /tmp/old-pg-bin -B /usr/lib/postgresql/$NEW_MAJOR/bin \
    -d /tmp/old-pg-data/ -D /tmp/new-pg-data/ \
    -o \"-c config_file=/tmp/old-pg-data/postgresql.conf\" -O \"-c config_file=/tmp/new-pg-data/postgresql.conf\""

# Remove temp container
(docker stop temp_postgres_util) > /dev/null
(docker rm temp_postgres_util) > /dev/null

rm -rf "$OLD_BIN_DIR"

echo -e "Data migration from $OLD_MAJOR to $NEW_MAJOR is complete!"

the idea is a bit convoluted, because I'm extracting the binaries from the old version and mounting them into a new temporary container created from the same image of the container with the new version, along with data volumes from existing containers (the old and the new ones). My idea was then to use that container to run pg_upgrade, then throw it away (and data would have been migrated through the two volumes). When running the script, though, I get the following error:

Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok

*failure*
Consult the last few lines of "pg_upgrade_server.log" for
the probable cause of the failure.

connection to database failed: could not connect to server: No such file or directory
    Is the server running locally and accepting
    connections on Unix domain socket "/tmp/.s.PGSQL.50432"?

could not connect to old postmaster started with the command:
"/tmp/old-pg-bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/tmp/old-pg-data/" -o "-p 50432 -b -c config_file=/tmp/old-pg-data/postgresql.conf -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/tmp'" start
Failure, exiting

Any ideas?

donnykurnia commented 8 years ago

I'm using @Dirbaio tricks to dump all data from old container and restore it into the new container. Of course, this needs separate data volume and run fast with small data sets. I hope that pr_upgrade can be more 'intelligent', even better if the Postgres itself could do the upgrade by itself. I mean, when installing the new version, the apps should also know how old version format looks like. Maybe also include a necessary old version binary to do the data 'upgrade' then after the upgrade finished, just delete that old version binary.

julienmathevet commented 8 years ago

+1

mkarg commented 8 years ago

I think we should have a complete solution or none at all, as I doubt that most people always use the latest version.

From: Jonas Thiem [mailto:notifications@github.com] Sent: Freitag, 20. Mai 2016 08:53 To: docker-library/postgres Cc: Markus KARG; Comment Subject: Re: [docker-library/postgres] Upgrading between major versions? (#37)

Ok so why can't the container be changed to simply have both the latest and the second-to-latest postgresql version installed? (e.g. just put a chroot into the docker container and install the older package from the distribution there or something. Once a script for that has be made that just needs to be passed the name of the respective older apt package, it shouldn't really be a lengthy process) At least that would cover the majority of users and it would allow to successfully convert the database for them automatically by the container.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/docker-library/postgres/issues/37#issuecomment-220531138 https://github.com/notifications/beacon/ABn3tyqP3YQcazFg6O98ZNdeqZx_FYJwks5qDVpUgaJpZM4C_kVy.gif

ghost commented 8 years ago

Why is none at all any better? I would assume most people running a production server with some basic security consciousness will probably use at least some version of somewhere in the latest release cycle - such major upgrades aren't happening that often, right? (I'm thinking of the usual docker environment here optimized for flexibility and repeatability where upgrading and reverting if it goes wrong is usually less painful than for a regular oldschool server administrator) And if doing it for all versions is not feasible, at least doing it for the respective second-to-newest one should be doable..

If for the other cases where the upgrade can't be done there is a descriptive error message like there is now, I don't see how this wouldn't be at least a considerable improvement - although I agree that being able to upgrade from any arbitrary previous version automatically is of course better if it can be done.

mkarg commented 8 years ago

It binds resources better invested into developing a general solution, and it possibly could stand in the way when coming up with the general solution.

-Markus

From: Jonas Thiem [mailto:notifications@github.com] Sent: Montag, 23. Mai 2016 17:53 To: docker-library/postgres Cc: Markus KARG; Comment Subject: Re: [docker-library/postgres] Upgrading between major versions? (#37)

Why is none at all any better? I would assume most people running a production server with some basic security consciousness will probably use at least the latest release - they aren't happening that often, right? And if doing it for all versions is not feasible, at least doing it for the respective second-to-newest one should be doable..

If for the other cases where the upgrade can't be done there is a descriptive error message like there is now, I don't see how this wouldn't be at least a considerable improvement - although I agree that being able to upgrade from any arbitrary previous version automatically is of course better if it can be done.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/docker-library/postgres/issues/37#issuecomment-221016122 https://github.com/notifications/beacon/ABn3t8Q4uNkFUuMUTDo-1BtrJaS5KDrRks5qEc1GgaJpZM4C_kVy.gif

mkarg commented 8 years ago

Yes, resignation is the right word. We just do not see how it could be solved in a good way.

mkarg commented 8 years ago

If you have time you can do whatever you like -- this is the nature of open source. Whether or not the project leads will accept your PR is a different question and not up to me to decide.

tianon commented 8 years ago

I'm a strong -1 on including the "second to last version" in every tag unless the size bloat by doing so is extremely minimal (which is why this solution hasn't been implemented before now).

tianon commented 8 years ago

I spent a little time working on a generic way to run pg_upgrade for somewhat arbitrary version combinations (only increasing, ie 9.0 to 9.5, but not the reverse), and here's the result: tianon/postgres-upgrade (might be easier to read directly over at https://github.com/tianon/docker-postgres-upgrade/blob/master/README.md)

rosenfeld commented 8 years ago

I haven't tested this yet but, since it's built on top of Debian, maybe an option would be to add some script to /docker-entrypoint-initdb.d like this, during the upgrade process (ideally making an snapshot of the original data first when using BTRFS for example):

Considering the current DB is on 9.4 and you want to migrate it to 9.5

apt-get update
apt-get install -y postgresql-9.5 postgresql-contrib-9.5
pg_dropcluster 9.5 main
pg_upgradecluster 9.4 main -m upgrade -k

This is the basic idea. However, this will add some unneccessary downtime. We could save some time by creating a temporary image:

FROM postgres:9.4
RUN apt-get update && apt-get install -y postgresql-9.5 postgresql-contrib-9.5
RUN echo "pg_dropcluster 9.5 main; pg_upgradecluster 9.4 main -m upgrade -k" > /docker-entrypoint-initdb.d/zzz-upgrade.sh

This is just to simplify the explanation, I'd probably use COPY rather than echo ... > ......

The container running from this new image would then only execute the remaining steps and would be run only once during the upgrade after stopping the previous 9.4 container and before starting the new 9.5 container. pg_upgradecluster will reuse the previous configuration when creating the new cluster.

Maybe to reduce the impact on users, one might want to run the 9.4 cluster in a read-only transaction during the upgrade so that it would mostly work and handle the write errors in the application to tell the users the database is being upgraded and that it's currently not possible to write new data to it... If upgrading the cluster would take a while even with the --link/-k option, maybe a read-only mode might give users a better experience rather than presenting them a static maintainance page for a long while... Maybe it could switch the header to let users know that it's in read-only mode during the upgrade in the meanwhile... It's more effort, but a better user experience for some sort of applications.

yosifkit commented 8 years ago

@JonasT, The only complete way is to have images for each set of supported versions; to ensure users can upgrade between them. @tianon, has a set of them in tianon/docker-postgres-upgrade. The other major problem is that it requires two folders (volumes) in specific locations to upgrade your database, since pg_upgrade requires both servers to be running and cannot just upgrade in place. Tianon's repo has more information on the requirements of this, which includes how to do it with one volume (properly setup), so that pg_upgrade can take advantage of it being a single drive and "use hard links instead of copying files to the new cluster".

The reason mariadb works to "upgrade automatically" is that the mariadb team has to put in extra work to make newer versions able to read older data directories, whereas postgres can make backwards incompatible changes in newer releases and not have to worry about file formats from older versions. This is why postgres data is usually stored under a directory of its version, but that would over complicate volume mounts if every version of postgres had a different volume.

@rosenfeld, unfortunately pg_dropcluster and pg_upgradecluster are specific tooling for the Debian provided packages and do not work with the Postgres provided apt/deb packages. Even after writing a new entrypoint to run the upgrade I just get failures, since it is assume specific directory structures and probably an init system:

$ # script:
#!/bin/bash

gosu postgres pg_ctl -D "$PGDATA" \
    -o "-c listen_addresses='localhost'" \
    -w start

gosu postgres pg_dropcluster 9.5 main
gosu postgres pg_upgradecluster 9.4 main -m upgrade -k

gosu postgres pg_ctl -D "$PGDATA" -m fast -w stop
exec gosu postgres "$@"

$ # shortened ouput (after initializing the db folder with a regular postgres image)
...
waiting for server to start....LOG:  database system was shut down at 2016-10-03 18:22:33 UTC
LOG:  MultiXact member wraparound protections are now enabled
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
 done
server started
Error: specified cluster does not exist
Error: specified cluster does not exist

rosenfeld commented 8 years ago

Hi @yosifkit. I don't understand why you think pg_upgradecluster being specific to Debian is a problem since the official PG images are based on Debian. As a matter of fact this is how I upgraded my PG servers from 9.5.4 to 9.6:

docker pull postgresql:9.6 # so we can easily start it after the upgrade is finished
docker run -it --rm --name pg-upgrade -v /mnt/pg/data:/var/lib/postgresql \
  -v /mnt/pg/config:/etc/postgresql --entrypoint bash postgres:9.5.4

# in the bash session in the container I installed 9.6:
apt-get update && apt-get install -y postgresql-9.6 postgresql-contrib-9.6
# Then, in another session, I stopped the server running current PG in order to
# start the upgrade (it's always a good idea to perform a back-up of the data
# before upgrading anyway).
pg_upgradecluster -m upgrade --link 9.5 main
# optionally: pg_dropcluster 9.5 main

After the upgrade is finished just start the new 9.6 container with the same arguments.

Here's how I run the container:

docker run --name pg -v /mnt/pg/scripts:/pg-scripts \
  -v /mnt/pg/data:/var/lib/postgresql \
  -v /mnt/pg/config:/etc/postgresql -p 5432:5432 \
  postgresql:9.6 /pg-scripts/start-pg 9.6 10.0.1.0/24

I stop it with docker exec pg pg_ctlcluster -f 9.6 main stop.

Here's how start-pg looks like:

#!/bin/bash
version=$1
net=$2
setup_db(){
  pg_createcluster $version main -o listen_addresses='*' -o wal_level=hot_standby \
    -o max_wal_senders=3 -o hot_standby=on -- -A trust
  pghba=/etc/postgresql/$version/main/pg_hba.conf
  echo -e "host\tall\tpguser\t$net\ttrust" >> $pghba
  echo -e "host\treplication\tpguser\t$net\ttrust" >> $pghba
  pg_ctlcluster $version main start
  psql -U postgres -c '\du' postgres|grep -q pguser || \
    createuser -U postgres -l -s pguser
  pg_ctlcluster $version main stop
}
[ -d /var/lib/postgresql/$version/main ] || setup_db
exec pg_ctlcluster --foreground $version main start

The server is actually managed by some systemd unit files, but this is basically how it works behind the scene.

fabiand commented 8 years ago

What about a solution where the new container is using an image with an older postgresql version to launch the older version, then use the regular flow to perform the update? (This is basically nesting the older postgresql container inside the new postgresql container).

It might be tricky to get it right, but at least the new postgresql image does not need to contain any previous postgresql version.

dschilling commented 8 years ago

I love the simplicity of @Dirbaio's approach. You wouldn't have to know what two versions you're migrating between for it to work. If you're worried about the speed of that approach, it sounds like the Upgrade via Replication approach described in the docs would be the best option, over pg_upgrade, in terms of actual downtime.

rosenfeld commented 8 years ago

@dschilling yes, if you can afford the time to set up slony (I started to read its documentation once but found it very complicated) or if you can't afford any downtime window, that's probably the best plan. Since I'm not comfortable with setting up Slony I preferred to use the method I explained above because the downtime is much smaller when you use pg_upgrade with --link. Since it can only upgrade the full cluster rather than a single database I decided to use our production database in a dedicated cluster so that the downtime would be as small as possible while using the pg_upgrade approach.

thijslemmens commented 8 years ago

Instead of Slony, you could use pglogical: https://2ndquadrant.com/en/resources/pglogical/

I plan to try this approach, using a HAProxy to switch from the old to the new version. Pglogical should be easier to use than Slony, but I still need to validate.

rosenfeld commented 8 years ago

pglogical doesn't work with the vanilla official packages, that's why I'm not comfortable with using it. Also, in that case, you wouldn't be using this Docker image.

thijslemmens commented 8 years ago

It is an extension, not a hack. You can put it in a layer on top of this image. I agree on the main topic, that it would be nice to be able to do a pg_upgrade with this image.

dastrasmue commented 8 years ago

I gave some thought to this problem. Here is what I came up with:

Given that you want to upgrade from 9.3 to 9.5, you can create two containers to migrate the old data to a new volume. I assume the data resides in local directories on the docker-host that are bind-mounted. To perform the upgrade, start one container for each version and share volumes of the 9.3-binarires and the data directory. You then can execute pg_upgrade and migrate to a new data directory. Afterwards you could mount the new data directory in your production environment with a postgres 9.5 image.

I wrote this down in a compose file:

version: '2'
services:
  old:
    image: postgres:9.3.14
    command: echo "done"
    volumes:
      - /usr/lib/postgresql/9.3
      - /usr/share/postgresql/9.3
  new:
    image: postgres:9.5.5
    command: >
      bash -c "gosu postgres initdb  && cd /tmp \
      && gosu postgres pg_upgrade \
         -b /usr/lib/postgresql/9.3/bin/ \
         -B /usr/lib/postgresql/9.5/bin/ \
         -d /old/ \
         -D /var/lib/postgresql/data/"
    volumes:
      - ./old:/old
      - ./new:/var/lib/postgresql/data
    volumes_from:
      - old

The first container is just started to fill the volumes with the required binaries. Before the upgrade can be performed, a database has to be initialized inside the 9.5-container. The upgrade has to be invoked by the postgres-user (therefore gosu postgres).

Actually the idea comes close to the suggestion that @asprega had.

What are your thoughts on this?

chulkilee commented 7 years ago

@dastrasmue thanks for the docker compose file. Here is a part of my script I used.

old_datadir=somewhere
rm -rf new_data && mkdir new_data

cat > docker-compose.yml << EOF
version: '2'
services:
  old:
    image: postgres:9.5
    command: echo "done"
    volumes:
      - /usr/lib/postgresql/9.5
      - /usr/share/postgresql/9.5
  new:
    image: postgres:9.6
    volumes:
      - ./new_data:/var/lib/postgresql/data
  upgrade:
    image: postgres:9.6
    command: >
      bash -c "
        cd /tmp && \
        gosu postgres pg_upgrade \
           -b /usr/lib/postgresql/9.5/bin/ \
           -B /usr/lib/postgresql/9.6/bin/ \
           -d /old/ \
           -D /var/lib/postgresql/data/"
    volumes:
      - $old_datadir:/old
      - ./new_data:/var/lib/postgresql/data
    volumes_from:
      - old:ro
EOF

docker-compose up -d old new
sleep 5
docker-compose stop new
docker-compose up upgrade

See some improvements

mount old data with ro
use docker entrypoint (e.g. create user, fix permission)

@asprega I hit the similar problem when I used plain docker commands because binaries tries to use lib under the same path - so copying them doesn't work.

austinnichols101 commented 7 years ago

I just added #250 to address a problem with upgrading alpine. The postgres binaries are not in the local MAJOR.MINOR bin folder so the multi-container approaches above do not work.

I was able to successfully upgrade using the script from @chulkilee. However, I had to chown the contents of the alpine data folder to 999 (alpine uses 70), upgrade, and then chown back.

Tapuzi commented 7 years ago

Here is what I did, loosely, to upgrade 9.4 to 9.6: (You probably want to discard the PGDATA environment variable)

docker run --name postgres94 -d -v /tmp/pg/mount:/var/lib/postgresql/data -e 'PGDATA=/var/lib/postgresql/data/pgdata' postgres:9.4 docker exec -it postgres94 /bin/bash

# Inside the container cd var/lib/postgresql/data touch db.sql chown postgres db.sql chmod 777 db.sql su postgres pg_dumpall > db.sql exit

docker stop postgres94 && docker rm postgres94 docker run --name postgres96 -d -v /tmp/pg/mount:/var/lib/postgresql/data2 -v /tmp/pg/9.6:/var/lib/postgresql/data -e 'PGDATA=/var/lib/postgresql/data/pgdata' postgres:9.6 docker exec -it postgres96 /bin/bash

# Inside the container cd var/lib/postgresql/data2 su postgres psql -f db.sql

/tmp/pg/9.6/data/pgdata now contains the data folder for 9.6. The postgres 9.6 running image is usable.

thijslemmens commented 7 years ago

@Tapuzi Yes, that works, but dumping and re-importing takes a long time for big databases, causing a lot of downtime/read_only time. Pg_upgrade is relatively fast, but not straightforward to use with the default docker images. I think this thread is about upgrading Postgresql faster, or with less downtime than the approach you just described.

iamtio commented 7 years ago

I created a script which uses tianon/docker-postgres-upgrade for upgrading data of postgres container in volume. I hope it will help someone

OLD_PG=9.5
NEW_PG=9.6
PG_DATA_VOLUME=data

docker volume create pg_data_new

# Upgrade db files to pg_data_new volume
docker run --rm -v $PG_DATA_VOLUME:/var/lib/postgresql/$OLD_PG/data -v pg_data_new:/var/lib/postgresql/$NEW_PG/data tianon/postgres-upgrade:$OLD_PG-to-$NEW_PG

# Move files to old volume
docker run --rm -v $PG_DATA_VOLUME:/old -v pg_data_new:/new postgres:$NEW_PG /bin/bash -c 'mv /old/pg_hba.conf /tmp && rm -rf /old/* && cp -a /new/. /old/ && mv /tmp/pg_hba.conf /old/'

docker volume rm pg_data_new

kingbuzzman commented 7 years ago

I got a better solution:

This is AFTER you changed the docker-compose.yml to the new version, then run these commands one by one. (ps this is only for the alpine version, you will have to change the apk add and the packages if you use the standard version, you will also have to change the version of postgres you're using (obviously :P ))

docker-compose run --no-dep --rm postgres bash; docker-compose down
  # install the old server 
  apk add --update alpine-sdk ca-certificates openssl tar bison coreutils dpkg-dev dpkg flex gcc libc-dev libedit-dev libxml2-dev libxslt-dev make openssl-dev perl perl-ipc-run util-linux-dev zlib-dev -y; curl -s https://ftp.postgresql.org/pub/source/v9.5.4/postgresql-9.5.4.tar.bz2 | tar xvj -C /var/lib/postgresql/; cd /var/lib/postgresql/postgresql-9.5.4/; ./configure --prefix=/pg9.5; make; make install
  # update the data 
  su postgres -c "initdb --username=postgres /var/lib/postgresql/data2; chmod 700 /var/lib/postgresql/data; pg_upgrade -b /pg9.5/bin/ -B /usr/local/bin/ -d /var/lib/postgresql/data/ -D /var/lib/postgresql/data2; cp /var/lib/postgresql/data/pg_hba.conf /var/lib/postgresql/data2/pg_hba.conf; rm -rf /var/lib/postgresql/data/*; mv /var/lib/postgresql/data2/* /var/lib/postgresql/data/"

pauvos commented 6 years ago

Just for reference:

The OpenShift guys implemented an "automagical" POSTGRES_UPGRADE environment variable and they included both postgres-server versions in the image (9.5 and 9.6 ... upgrading from 9.4 is not possible):

https://hub.docker.com/r/centos/postgresql-96-centos7/

Image is around 111 MB in size... I think people would love to download a few extra bytes for convienient updates.

claytondaley commented 6 years ago

It seems to me what makes this "hard" is the need to mount two data directories from the host (as demonstrated by @tianon's helper images, except under ideal circumstances) and then manage the binding of the final image to the right folder. We use deployment automation so we can manage directories like this automatically, but it seems inefficient to force everyone to manage the same complexity outside of these images.

Possibly heretical, but food for thought:

What if data was instead stored in $PGDATA/$PG_MAJOR (or even $PGDATA/$PG_MAJOR/data)
A startup script searches (sequentially):
- $PGDATA/$PG_MAJOR
- Previous $PGDATA/$PG_MAJOR
- $PGDATA (for backwards compatibility)
If no database is found, it creates $PGDATA/$PG_MAJOR
If it finds a database, it uses the PG_VERSION file to decide if an upgrade is necessary
If an upgrade is necessary, a separate (to make it easier to customize) upgrade script is called. By default, the script:
- Installs the old binaries (i.e. the old PG_VERSION)
- Upgrades using pg_upgrade
- Uninstalls the old binaries

I appreciate that changing the interpretation of $PGDATA could be viewed as "significant", but it has benefits:

You need only mount a single folder from the host (with backwards compatible scripts)
The strategy is "generic" since it detects the current version to make upgrade decisions
The strategy can use pg_upgrade so it'll be fast
There's no need to overwrite/replace existing data; eventually, you delete the old $PG_MAJOR folder

monotek commented 6 years ago

Imho export & remimport via sql would be easiest way. One could just put the old sql file in some volume and it could be imported automatically on start.

tianon commented 6 years ago

It seems to me what makes this "hard" is the need to mount two data directories from the host

Not exactly -- for me, the thing that makes this "hard" (and unacceptable by default for the official images) is that it requires having both the target old and target new versions of PostgreSQL installed in the image. To support arbitrary version bumps from an older version to a newer version, we'd have to install all versions (because the "export and reimport" method described is essentially how pg_upgrade works, in its most basic form).

claytondaley commented 6 years ago

To support arbitrary version bumps from an older version to a newer version, we'd have to install all versions

Why can't the upgrade script install the appropriate version (as I described), run the upgrade, then uninstall?

claytondaley commented 6 years ago

@JonasT I figure people with additional constraints like that are sophisticated enough to understand how to adjust their Docker workflow. Best to design a default for the "plug and play" users who aren't going to be able to troubleshoot something harder.

If you want to make it "even easier", offer a 10.0-upgrade image that pre-installs everything. A no-internet user can use that imagine for the upgrade. If they're concerned about space, they can switch back to 10.0 when done. Since the versions match, the upgrade logic will not run and the lack of internet will not matter.

Philosophically, I think the ideal strategy allows most people to change 9.4 to 10.0 without changing mount points or running any special commands. In my proposal, internet-enabled users can. Internet-disabled users have to use the -upgrade image (either always or transiently).

claytondaley commented 6 years ago

@JonasT whether or not you bake in the second-to-last is orthogonal to the substance of my proposal. For that matter, why not include all of the binaries and... after the upgrade logic runs (or is skipped)... uninstall all of them? Won't that simultaneously support all upgrades (including your "best practice") and still minimize the final footprint?

My (orthogonal) question is how do you "allow... people to change 9.4 to 10.0 without changing mount points or running any special commands".

I don't think there's any debate that the right script can handle the "no special commands" part.
But I don't see how it works with the current binding/storage/$PGDATA strategy.

The more user-friendly versions of @tianon's scripts address the binding issue by binding "up" a level (or two) so the versions are found in $PG_MAJOR folders. I think it makes sense to generalize @tianon's solution for the actual postgres containers (with some backwards compatibility).

ghost commented 6 years ago

@claytondaley

For that matter, why not include all of the binaries

Because the image size will blow up if you have them installed there and only remove them as soon as the container launches. I wouldn't mind it much, but others in this thread have expressed concern.

jedie commented 6 years ago

Currently there is no official statement how to upgrade, isn't it?!?

tianon commented 6 years ago

The "official statement" is that you upgrade in Docker just like you would any other instance of PostgreSQL: via pg_upgrade.

The only thing that makes doing so tricky in the case of these images is the need to have both the new and old versions installed concurrently. See https://github.com/tianon/docker-postgres-upgrade (which is linked a couple times in the previous comments) for where I'm maintaining a set of images which accomplish this in the bare minimum way possible (which you can either use directly or refer to in building your own solution).

monotek commented 6 years ago

Is there no way of a SQL dump in the old version & reimport it in the new version to do an upgrade?

chulkilee commented 6 years ago

Here's my two cents:

dump & restore may work, but you should prefer pg_upgrade for many reasons (search it)
upgrading PostgreSQL data into the version of current running docker automatically should not happen; it's always safer to require explicit instruction to perform backward-incompatible changes

PostgreSQL has the official way to upgrade, and you can do it. The question is whether this "official" postgres docker image should support it out of box. It's good to have, technically it needs binary in both version - the current version and your version, which is impractical to include it in docker image.

I believe the consensus of "official docker image" is "minimum wrapper of standard package". It is not full installer with upgrade feature. This upgrade problem is nothing special with docker..

I think the documentation of postgres docker image just need to mention pg_upgrade - and probably link to this issue (or even https://github.com/tianon/docker-postgres-upgrade).

tianon commented 6 years ago

Dump and reload is an OK solution (although definitely sub-optimal, as described above) -- just use pg_dump like you normally would, then import it into the new version and you're good to go. As noted above, that's basically how pg_upgrade works in many cases (and why it needs both versions installed).

docker-library / postgres

Upgrading between major versions? #37