docker-library / postgres

Docker Official Image packaging for Postgres
http://www.postgresql.org
MIT License
2.16k stars 1.13k forks source link

Change `PGDATA` in 17+ to `/var/lib/postgresql/MAJOR/docker` #1259

Open tianon opened 2 months ago

tianon commented 2 months ago

This is a pretty large breaking change, which is why this only makes the change in 17+ (which is currently in pre-release stages, and not due for GA until September, and pre-release PGDATA directories are officially not supported on the GA release anyhow).

Concretely, this changes PGDATA to /var/lib/postgresql/MAJOR/docker, which matches the pre-existing convention/standard of the pg_ctlcluster/postgresql-common set of commands, and frankly is what we should've done to begin with, in a classic case of Chesterton's Fence.

This also changes the VOLUME to /var/lib/postgresql, which should be more reasonable, and make the upgrade constraints more obvious.

For any users who have been testing the pre-releases, the simplest way to keep your existing data directory is going to be to add PGDATA=/var/lib/postgresql/data as an environment variable on your container or adjust your bind-mount from /var/lib/postgresql/data to /var/lib/postgresql/17/docker, but the best way is going to be to refactor your host directory such that your data lives at 17/docker inside and you can then mount directly to /var/lib/postgresql (possibly setting PGDATA=/var/lib/postgresql/MAJOR/docker as well, if you want to go overboard on being explicit).

tianon commented 2 months ago

(currently includes the commit of https://github.com/docker-library/postgres/pull/1258 for simplicity, but if that one is merged, I'll rebase) done!

tianon commented 2 months ago

I'm not sure how we help catch cases of users blindly upgrading to 17+ with a volume of /var/lib/postgresql/data and being surprised when it doesn't work correctly. Maybe I'll do some testing with making /var/lib/postgresql/data a symlink to /var/lib/postgresql and see how Docker behaves with that (so that at least those users would still have data, even if it doesn't look exactly how they're expecting).

tianon commented 2 months ago
$ docker run -it --rm -v foo:/var/lib/postgresql/data --name foo --env POSTGRES_PASSWORD=foo 8b5ecf7fdad7
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/var/lib/docker/volumes/foo/_data" to rootfs at "/var/lib/postgresql/data": change mount propagation through procfd: open o_path procfd: open /var/lib/docker/overlay2/bd954e9c05618d52115b5345f7465cf17cc426560b0979d7f796ebfbf62ea950/merged/var/lib/postgresql/data: no such file or directory: unknown.

Ouch, nope, symlink is a bust.

tianon commented 2 months ago

I mean, on the plus side, it did make the container runtime fail to start the container, so maybe that's better than silently and quietly failing? :thinking:

tianon commented 2 months ago

Oh, I missed the most obvious way to mitigate this change for existing users: just set PGDATA=/var/lib/postgresql/data directly on your container. :smile:

(OP updated)

justinclift commented 1 month ago

this changes PGDATA to /var/lib/postgresql/MAJOR/docker, which matches the pre-existing convention/standard of the pg_ctlcluster/postgresql-common set of commands

It's an interesting idea. Trying to figure out how that's beneficial for containers, but nothing super useful is popping up for me.

What's the concept you have in mind? :smile:

tianon commented 1 month ago

It's not directly useful so much as "more in line with the common standards used in this pre-existing ecosystem" -- us using a non-standard path has been a source of friction for our users since we first introduced the image.

This gets especially focused when you try to use pg_upgrade and realize you have to do some fancy footwork either in your host (matching a directory structure something like this, in fact) or in your bind mounts (and thus sacrificing speed since you can't use --link) to even make things work properly.

justinclift commented 1 month ago

Interesting. My first thought is that embedding the version number in the path like this would probably make adding some kind of automatic upgrade images much more difficult.

At least, the approach we're using with the pgautoupgrade repo is to upgrade things in place.

Might be able to work around such a change with a bit of extra shell scripting though.

I guess the 2nd worry I have is that this change is breaking the many years of following a particular approach (and thereby needing people to update their custom tooling to match), mostly for what seems like an arbitrary change without a clear benefit. Those kind of changes tend to cause follow on headaches for people. :frowning:

tianon commented 1 month ago

The workaround is to set PGDATA explicitly -- if you set it back to the previous value, it will "just work" (in both the old and the new images).

justinclift commented 1 month ago

Cool, yeah that should work for people with existing images and tooling.

With this:

when you try to use pg_upgrade and realize you have to do some fancy footwork ...

I'm trying to understand the pain/failure scenario there. Are there good examples of this problem in this repo that I can read through to better understand?

With this new path structure, it seems to be matching the path layout which Debian has implemented for its PostgreSQL packaging. So I can sort of understand where you're coming from with respect to implementing it for the Debian based images.

Alpine doesn't seem to use that path structure for its PostgreSQL packages though?

yosifkit commented 2 weeks ago

Alpine doesn't seem to use that path structure for its PostgreSQL packages though?

It does seem to use the same paths as noted by a user and when I install and run the postgresqlXX packages:

/ # apk add postgresql14
...
/ # rc-service postgresql start
/ # rc-service postgresql stop
/ # ls -l /var/lib/postgresql/*/data/PG_VERSION
-rw-------    1 postgres postgres         3 Aug 19 21:49 /var/lib/postgresql/14/data/PG_VERSION
/ # apk add postgresql15
...
/ # pg_versions set-default 15
/ # rc-service postgresql start
/ # ls -l /var/lib/postgresql/*/data/PG_VERSION
-rw-------    1 postgres postgres         3 Aug 19 21:49 /var/lib/postgresql/14/data/PG_VERSION
-rw-------    1 postgres postgres         3 Aug 19 21:50 /var/lib/postgresql/15/data/PG_VERSION

Are there good examples of this problem in this repo that I can read through to better understand?

I think it is partly that the --link option for pg_upgrade is hard to use with the current layout. In order to use it, your new and old database files must be "on the same mount". So, when using -v for a bind mount, they have to be specified in only one -v. So, for upgrade, you can't just use the same volume because the database data resides directly at the root of the volume. If using a Docker volume and not a bind mount, you are likely stuck using the slow version of upgrade.

Current optimal usage:

$ find DIR -mindepth 2 -maxdepth 2
DIR/OLD/data
DIR/NEW/data
$ # "OLD" server was run with a specific subdir next to where next version will store data
$ docker run -it --rm \
    -v DIR/OLD/data:/var/lib/postgresql/data \
    postgres:OLD
$ docker run --rm \
    -v DIR:/var/lib/postgresql \
    tianon/postgres-upgrade:OLD-to-NEW \
    --link
$ # NEW server with the new directory
$ docker run -d \
    -v DIR/NEW/data:/var/lib/postgresql/data \
    postgres:NEW

Slow upgrade with volumes (or unable to use the OLD directory parent or unable to move the OLD data directory):

$ docker run -it --rm \
    -v PGDATAOLD:/var/lib/postgresql/data \
    postgres:OLD
$ docker run --rm \
    -v PGDATAOLD:/var/lib/postgresql/OLD/data \
    -v PGDATANEW:/var/lib/postgresql/NEW/data \
    tianon/postgres-upgrade:OLD-to-NEW
$ docker run -d \
    -v PGDATANEW:/var/lib/postgresql/data \
    postgres:NEW

Using the newer layout from this proposal will mean that running or upgrading can use the same mount path (or a volume):

$ # "old" server
$ docker run -it --rm \
    -v DIR:/var/lib/postgresql \
    postgres:17
$ # upgrade
$ docker run --rm \
    -v DIR:/var/lib/postgresql \
    tianon/postgres-upgrade:17-to-18 \
    --link
$ # new server
$ docker run -d \
    -v DIR:/var/lib/postgresql \
    postgres:18
yosifkit commented 2 weeks ago

Drawback of this proposal is that if a user runs a new major PostgreSQL version without upgrading (like 17.x to 18.x), then they will just have a running empty database rather than the current error about Database file version mismatch. Their data will still exist, just not be accessible until they downgrade the image or do a pg_upgrade. So, we will definitely get a few users complaining that "all their data is gone".

justinclift commented 2 weeks ago

So, we will definitely get a few users complaining that "all their data is gone".

Hmmm, that'd probably be fixable if the start up script checks for that situation and moves things around automatically.

Guiorgy commented 1 day ago

So, we will definitely get a few users complaining that "all their data is gone".

Hmmm, that'd probably be fixable if the start up script checks for that situation and moves things around automatically.

Couldn't the startup script just error out if it finds an existing older version database, or automatically upgrade that database to the current version if some enviromental variable, for example AUTOMATIC_UPGRADE, is set?

justinclift commented 1 day ago

or automatically upgrade that database to the current version if some enviromental variable, for example AUTOMATIC_UPGRADE, is set?

The automatic upgrade thing would definitely benefit a lot of people. Not sure if the maintainers here are interested in including that (note the complete lack of response), though I could be wrong.

Guiorgy commented 1 day ago

The automatic upgrade thing would definitely benefit a lot of people. Not sure if the maintainers here are interested in including that (note the complete lack of response), though I could be wrong.

Thanks for the link. Acording to tianon there:

The main problem as I understand it is that we need both the old version and the new version of the postgres binaries simultaneously in the same container (otherwise you can't pgdump the old data).

And also:

I'm a strong -1 on including the "second to last version" in every tag [...]

This seems to be the main roadblock.

justinclift commented 1 day ago

This seems to be the main roadblock.

To me, that seems a bit silly. I mean, that's why I was suggesting having PG images that don't do "automatic upgrade" as they'd just be the "normal" size.

But for people that do need an automatically upgrading image - unlikely to be everyone? - then sure it's going to be a bit bigger. They don't seem all that huge though. For reference, these are the standard PostgreSQL images:

https://hub.docker.com/_/postgres/tags

They're about 140-150MB (roughly).

These are the "automatic upgrade" images at the pgautoupgrade project:

https://hub.docker.com/repository/docker/pgautoupgrade/pgautoupgrade/tags

They're about 190MB (roughly). So, ~40MB larger (~25%) due to including the binaries for several PG versions in them. Seems like a pretty decent trade off (to me). :smile: