alphagov / govuk-docker

GOV.UK development environment using Docker 🐳
MIT License
82 stars 22 forks source link

Problem replicating publishing-api to local machine #570

Closed KludgeKML closed 2 years ago

KludgeKML commented 2 years ago

I'm working on something that relies on publishing-api to provide a content object from a content and attachment id, so I need to replicate the publishing-api database locally. If I run the replicate command:

gds aws govuk-integration-readonly --art 6h ./bin/replicate-postgresql.sh publishing-api

(--art 6h to allow the 20Gb file to download)

...it gets the file, then runs through a huge list of docker containers trying to stop them, spins up the PG 13.1 container but then fails to do the import with the error:

Error response from daemon: OCI runtime start failed: cannot start a container that has stopped: unknown

(More details below, from the start of step 2 after all the containers have been stopped/removed)

docker-compose -f [...] up -d postgres-13
[+] Running 2/2
 ⠿ Network govuk-docker_default          Created                                                                                                                                           0.0s
 ⠿ Container govuk-docker-postgres-13-1  Started                                                                                                                                           0.3s
waiting for postgres...
docker-compose -f [...] config
docker-compose -f [...] run postgres-13 /usr/bin/psql -h postgres-13 -U postgres -c DROP DATABASE IF EXISTS "publishing-api"
DROP DATABASE
docker-compose -f [...] run postgres-13 /usr/bin/createdb -h postgres-13 -U postgres publishing-api
docker-compose -f [...] run postgres-13 /usr/bin/pg_restore -h postgres-13 -U postgres -d publishing-api --no-owner --no-privileges
^C^@^@^@^@^@^@^@^@^@?^@^@^@^@^@^@^@^@^@^@^@^@1259^@ ^@^@^@217347372^@
^@^@^@access_limits^@^E^@^@^@TABLE^@^B^@^@^@^@%^A^@^@CREATE TABLE public.access_limits (
    id integer NOT NULL,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL,
    edition_id integer,
    users jsonb DEFAULT '[]'::jsonb NOT NULL,
    organisations jsonb DEFAULT '[]'::jsonb NOT NULL
);
^@!^@^@^@DROP TABLE public.access_limits;
^A^A^@^@^@^@^F^@^@^@public^@^@^C^@^@^@^@^@^@^@^@^@?^@^@^@^@^@^@^@^@^@^@^@^@1259^@   ^@^@^@217347380^@^T^@^@^@access_limits_id_seq^@^H^@^@^@SEQUENCE^@^B^@^@^@^@}^@^@^@CREATE SEQUENCE public.access_limits_id_seq
    START WITH 1
    INCREME^C^@^@^@^@^@^@^@^@^@?^@^@^@^@^@^@^@^@^@^@^@^@1259^@  ^@^@^@217347412^@^P^@^@^@documents_id_seq^@^H^@^@^@SEQUENCE^@^B^@^@^@^@y^@^@^@CREATE SEQUENCE public.documents_id_seq
    START WITH 1
    INCREMENT BY 1
   ^C^@^@^@^@^@^@^@^@^@?^O^@^@^@^@^@^@^@^@^A^@^@^@0^@^A^@^@^@0^^C^@^@^@^@^@^@^@^@^@?^@^@^@^@^@^@^@^@^@^@^@^@1259^@  ^@^@^@217347414^@^H^@^@^@editions^@^E^@^@^@TABLE^@^B^@^@^@^@^R
^@^@^@1259^@    ^@^@^@217347414^@^H^@^@^@editions^@^E^@^@^@TABLE^@^B^@^@^@^@^E^@^@CREATE TABLE public.editions (
    id integer NOT NULL,
    title text,
    publi^C^@^@^@^@^@^@^@^@^@?^@^@^@^@^@^@^@^@^@^^@^@^@expanded_links_id_seq^@^H^@^@^@SEQUENCE^@^B^@^@^@^@~^@^@^@CREATE SEQUENCE public.expa^C^@^@^@^@^@^@^@^@^@?^@^@^@^@^@^@^@^@^@^@^@^@1259^@^@^@^@217347468^@^L^@^@^@links_id_seq^@^H^@^@^@SEQUENCE^@^B^@^@^@^@u^@^@^@CREATE SEQUENCE public.links_id_seq
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1;^C^@^@^@^@^@^@^@^@^@?^O^@^@^@^@^@^@^@^@^A^@^@^@0^@^A^@^@^@0^@^L^@^@^@links_id_seq^@^C^@^@^C^@^@^@^@^@^@^@^@^@?^@^@^@^@^@^@^@^@^@^@^@^@1259^@    ^@^@^@217347476^@^X^@^@^@path_reservations_id_seq^@^H^@^@^@SEQUENCE^@^B^@^@^@^@?^@^@^@CREATE SEQUENCE public.path_reservations_id_seq
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO^C^@^@^@^@^@^@^@^@^@?^@^@^@^@^@^@^@^@^@^@^@^@1259^@   ^@^@^@217347478^@^@^@^@schema_migrations^@^E^@^@^@TABLE^@^B^@^@^@^@R^@^@^@CREATE TABLE public.schema_migrations (
    version character varying NOT NULL
^C^@^@^@^@^@^@^@^@^@?^O^@^@^@^@^@^@^@^@^A^@^@^@0^@^A^@^@^@0^@^T^@^@^@unpublishings_id_seq^@^C^@^@^@^@^@^@^@^@^@?^@^@^@^@^@^@^@^@^@^@^@^@1259^@  ^@^@^@217347492^@^E^@^@^@users^@^E^@^@^@TABLE^@^B^@^@^@^@?^A^@^@CREATE TABLE public.users (
    id integer NOT NULL,
    name character varying,
    email character varying,
    uid character varying,
    organisation_slug character varying,
    organisation_content_id character varying,
    app_name character varying,
    permissions text,
    remotely_signed_out boolean DEFAULT false,
    disabled boolean DEFAULT false,
    crea^C^@^@^@^@^@^@^@^@^@?^O^@^@^@^@^@^@^@^@^A^@^@^@0^@^A^@^@^@0^@^L^@^@^@users_id_seq^@^C^@^@^@^@^@^@^@^@^@?^N^@^@^@^@^@^@^@^@^@^@^@2604^@  ^@^@^@217347503^@
^@^@^@actions id^@^G^@^@^@DEFAULT^@^B^@^@^@^@h^@^@^@ALTER TABLE ONLY public.actions ALTER COLUMN id SET DEFAULT nextval('public.actions_id_seq'::regclass);
^@9^@^@^@ALTER TABLE p^C^@^@^@^@^@^@^@^@^@?^N^@^@^@^@^@^@^@^@^@^@^@2604^@   ^@^@^@217347506^@^K^@^@^@editions id^@^G^@^@^@DEFAULT^@^B^@^@^@^@j^@^@^@ALTER TABLE ONLY public.editions ALTER COLUMN id SET DEFAULT nextval('public.editions_id_seq'::regclass);
^@:^@^@^@ALTER TABLE^C^@^@^@^@^@^@^@^@^@?^N^@^@^@^@^@^@^@^@^@^@^@2604^@ ^@^@^@217347508^@^@^@^@expanded_links id^@^G^@^@^@DEFAULT^@^B^@^@^@^@v^@^@^@ALTER TABLE ONLY public.expan^C^@^@^@^@^@^@^@^@^@?^O^@^@^@^@^@^@^@^@^A^@^@^@0^@^A^@^@^@0^@^P^@^@^@documents_id_seq^@^L^@^@^@SEQU^^C^@^@^@^@<^@^@^@SELECT pg_catalog.setval('pub^C^@^@^@^@^@^@^@^@^@^O^@^@^@^@^@^@^@^@^@^@^@2606^@   ^@^@^@222875031^@^X^@^@^@documents documents_pkey^@
^@^@^@CONSTRAINT^@^@^@^@^@V^@^@^@ALTER TABLE ONLY public.documents
    ADD CONSTRAINT documents_pkey PRIMARY KEY (id);
^@B^@^@^@^C^@^@^@^@^@^@^@^@^@^N^O^@^@^@^@^@^@^@^@^@^@^@^\^@^@^@index_editions_on_updated_at^@^E^@^@^@INDEX^@^@^@^@^@W^@^@^@CREATE INDEX index_editions_on_updated_at ON public.??^He[?TX{Uy?? ????L??   ??^O^F?6M???!????2???@j???k?#??i?֭^T^\<F8^A?-<?@^C?|^T<KA?m?^R
?|^T<KA?m???qe^P(ğ^T<??????^Z?j^Z?9_?^C??^Z?^H?h??6^G)@?A
??O?m?V?,bd|?R^Z_?R?v????Ԥ?^B?Q%?г?
Ӌ?^Y?U??[o???tܽМ_?R>?I?G?_????!6<٠B?^@^@^P^@^@??^G????U??y?X]?AE^@?b^_??_j?^A????^^!??(;?A?^\?B]27bd??&?Cp!??@^B??<?
jH^R
jH?^R
jH???'y?&?^[cl?7<:?01$!'??^N?*?w??~(^E^[?Ryc?΅(U??P?W??D?_䧦\?^X]b?AW?qx^L?^K?WP?????ʡ?^R
?[??7?B%?H?/^Z?^Y)dǶ%?!Ц^\?XR?UR?GZ*??Z??^TN?p?Ϛέ?^\U??"?.+?II??                                                                                                                              Zٔ ????G?D|p/?-^P?׿?^[????^R
????G?D|p/?-^P?׿?^[?????0??eؠ?$?^[?@
??                                                      
j?Q?/?^F.G?^N???a?
jR?x0?X??U??(^KW??I?6?I?^]E??2T?K'O?<dA"???t^Z?4???b?=^X?Q?????N*??%4h?o_?? j?<^F?*?|Mr????^]?^O?????D????Zu2^LB??\??????D?B?K?r5???;?A]?;^A?רXy!?"^^? ?OError response from daemon: OCI runtime start failed: cannot start a container that has stopped: unknown
 128KiB 0:00:00 [ 182KiB/s] [>                                                                                                                                                 ]  0%            
docker-compose -f [...] stop postgres-13
[+] Running 5/5
 ⠿ Container govuk-docker-postgres-13-1                 Stopped                                                                                                                            0.2s
 ⠿ Container govuk-docker_postgres-13_run_b2cf0b1c4b99  Stopped                                                                                                                            0.0s
 ⠿ Container govuk-docker_postgres-13_run_4bd59df1a7a6  Stopped                                                                                                                            0.0s
 ⠿ Container govuk-docker_postgres-13_run_1764a2258737  Stopped                                                                                                                            0.0s
 ⠿ Container govuk-docker_postgres-13_run_1f71fb33b0c6  Stopped 
ollietreend commented 2 years ago

It looks like this might actually be a general Postgres problem, rather than anything to do with the Publishing API specifically.

I've just tried importing data for the app service-manual-publisher, which has a much smaller database at only 12 MB (gzipped). That exhibited the same problem.

We've recently upgraded from Postgres 9.6 to Postgres 13. I wonder if the format of the database dumps have changed. Previously isn't wasn't normal for the CLI to print the database dump to the screen as it seems to do now.

KludgeKML commented 2 years ago

So, are we sure these files are being gzipped correctly? If I try to look at the dump I've got from the replication script:

keith.lawrence@GDS11342 postgresql % pwd
/Users/keith.lawrence/govuk/govuk-docker/replication/postgresql
keith.lawrence@GDS11342 postgresql % gunzip publishing_api_production.dump.gz 
gunzip: publishing_api_production.dump.gz: not in gzip format

and if I look in the file:

keith.lawrence@GDS11342 postgresql % head -n10 publishing_api_production.dump.gz 
PGDMzpublishing_api_production13.313.3 (Debian 13.3-1.pgdg100+1)??0ENCODINENCODINGSET client_encoding = 'UTF8';
false?00
STDSTRINGS
STDSTRINGS(SET standard_conforming_strings = 'on';
false?00
SEARCHPATH
SEARCHPATH8SELECT pg_catalog.set_config('search_path', '', false);
false?1262  217347371publishing_api_productioDATABASEnCREATE DATABASE publishing_api_production WITH TEMPLATE = template0 ENCODING = 'UTF8' LOCALE = 'en_US.UTF-8';
)DROP DATABASE publishing_api_production;

aws_db_adminfalse?00"DATABASE publishing_api_productionACLCGRANT ALL ON DATABASE publishing_api_production TO publishing_api;

..which doesn't look like a gz archive.

Is it possible the files are not being compressed in the new scripts (or the compress is failing quietly), and that's causing the import to script to fail?

KludgeKML commented 2 years ago

It strongly hints that something is going wrong in the backup, I think, The replication script might be okay, but the script that's dumping the data out of integration is maybe having problems? (I think I lack the permissions to investigate that further, and I certainly lack the product knowledge at the moment!)

kevindew commented 2 years ago

Worth tagging @barrucadu here who worked on the changes for this for the Postgres 13.

Re the .gz, one of the annoyances is that these dumps are named as gz files when they're actually the pgdump custom format. A good deed would be to change that in the file that creates the dump - but it does look a bit fiddly: https://github.com/alphagov/govuk-puppet/blob/c234f285227901fc027abe1fd49f3afed7266077/modules/govuk_env_sync/files/govuk_env_sync.sh#L179-L181 - I'm not sure if there's a standard extension for a pg_dump custom format - .pgdump ?

KludgeKML commented 2 years ago

Ah, that explains that. Though even for a different format they look maybe a little bit messed up - take a look at line 9 of the dump there, which has this:

...217347371publishing_api_productioDATABASEnCREATE DATABASE...

which looks like some output has interleaved with some other output accidentally (notice the "n" cut off the end of the database name).

kevindew commented 2 years ago

I've given this a go myself on a couple of smaller postgres databases (including service-manual-publisher as Olly stated) and haven't experienced the problem.

E.g. for Service Manual Publisher:

➜  govuk-docker git:(main) gds aws govuk-integration-poweruser ./bin/replicate-postgresql.sh service-manual-publisher
Touch your YubiKey...
Replicating postgres for service-manual-publisher
download: s3://govuk-integration-database-backups/service-manual-publisher-postgres/2022-02-03T05:00:01-service-manual-publisher_production.gz to replication/postgresql/service_manual_publisher_production.dump.gz
stopping running govuk-docker containers...
docker-compose -f [...] down
Removing govuk-docker_postgres-13_run_ed4b80446767 ... done
Removing govuk-docker_postgres-13_run_b6e0ac2e1353 ... done
Removing govuk-docker_postgres-13_run_6a9bdfadf7bb ... done
Removing govuk-docker_postgres-13_run_730f8681e4b2 ... done
Removing govuk-docker_postgres-13_1                ... done
Removing network govuk-docker_default
docker-compose -f [...] config
docker-compose -f [...] up -d postgres-13
Creating network "govuk-docker_default" with the default driver
Creating govuk-docker_postgres-13_1 ... done
waiting for postgres...
docker-compose -f [...] config
docker-compose -f [...] run postgres-13 /usr/bin/psql -h postgres-13 -U postgres -c DROP DATABASE IF EXISTS "service-manual-publisher"
Creating govuk-docker_postgres-13_run ... done
NOTICE:  database "service-manual-publisher" does not exist, skipping
DROP DATABASE
docker-compose -f [...] run postgres-13 /usr/bin/createdb -h postgres-13 -U postgres service-manual-publisher
Creating govuk-docker_postgres-13_run ... done
docker-compose -f [...] run postgres-13 /usr/bin/pg_restore -h postgres-13 -U postgres -d service-manual-publisher --no-owner --no-privileges
Creating govuk-docker_postgres-13_run ... done
11.6MiB 0:00:04 [2.64MiB/s] [================================================================================================================================================================================================================================>] 100%
docker-compose -f [...] stop postgres-13
Stopping govuk-docker_postgres-13_1 ... done

A couple of things that come to mind are:

I imagine the best route forward is to test restoring one of our smaller postgres databases to see if that can work - as it'll be way faster for others to try replicate and less painful for testing. Service Manual Publisher seems quite well suited for that as is only about ~12 MB in it's PGdump format.

I'm trying Publishing API myself but not expecting that to finish before I call it a day today.

Edit: not that it helps you much, but Publishing API replicated fine for me:

Publishing API output ``` ➜ govuk-docker git:(main) gds aws govuk-integration-poweruser ./bin/replicate-postgresql.sh publishing-api Touch your YubiKey... Replicating postgres for publishing-api download: s3://govuk-integration-database-backups/publishing-api-postgres/2022-02-03T05:00:01-publishing_api_production.gz to replication/postgresql/publishing_api_production.dump.gz stopping running govuk-docker containers... docker-compose -f [...] down Removing govuk-docker_postgres-13_run_5d3b049c86bd ... done Removing govuk-docker_postgres-13_run_1cba0f2c2b49 ... done Removing govuk-docker_postgres-13_run_1cab18c410af ... done Removing govuk-docker_postgres-13_run_9e70155dd42c ... done Removing govuk-docker_postgres-13_1 ... done Removing network govuk-docker_default docker-compose -f [...] config docker-compose -f [...] up -d postgres-13 Creating network "govuk-docker_default" with the default driver Creating govuk-docker_postgres-13_1 ... done waiting for postgres... docker-compose -f [...] config docker-compose -f [...] run postgres-13 /usr/bin/psql -h postgres-13 -U postgres -c DROP DATABASE IF EXISTS "publishing-api" Creating govuk-docker_postgres-13_run ... done DROP DATABASE docker-compose -f [...] run postgres-13 /usr/bin/createdb -h postgres-13 -U postgres publishing-api Creating govuk-docker_postgres-13_run ... done docker-compose -f [...] run postgres-13 /usr/bin/pg_restore -h postgres-13 -U postgres -d publishing-api --no-owner --no-privileges Creating govuk-docker_postgres-13_run ... done 20.5GiB 1:00:45 [5.75MiB/s] [================================================================================================================================================================================================================================>] 100% docker-compose -f [...] stop postgres-13 Stopping govuk-docker_postgres-13_1 ... done ```
ollietreend commented 2 years ago

@KludgeKML it looks like this is a compatibility issue with Docker Compose v2 – which is relatively new, whereas this project was built to work with v1.

This isn't the first time we've had issues with v2 – for example, see #506 which I hoped was fixed by #554.

I experienced the same issue as you when running with Docker Compose v2 – but switching back to Compose v1 seems to do the trick. Obviously that's not a long term fix, but it should get things working for you for the time being.

If you're using Docker for Mac, you should be able to switch back to v1 in the main Preferences pane:

Settings
ollietreend commented 2 years ago

I've just opened #573 which should resolve this issue.

KludgeKML commented 2 years ago

Fixed by https://github.com/alphagov/govuk-docker/pull/573