enova / pglogical_ticker

Time-based replication delay for pglogical
MIT License
3 stars 4 forks source link

Regression tests unstable: SELECT COUNT(1) FROM worker_pid WHERE pid IS NOT NULL #3

Closed df7cb closed 5 years ago

df7cb commented 5 years ago

During the monthly "make installcheck" run on apt.postgresql.org, 4 out of 22 dist/arch combinations failed: https://pgdgbuild.dus.dg-i.net/job/pglogical-ticker-adt/3/

Here is the output on jessie/ppc64el, for PG 10:

07:34:27 Ver Cluster Port Status Owner    Data directory                            Log file
07:34:27 10  regress 5432 online postgres /tmp/pg_virtualenv.kLuOo5/data/10/regress /tmp/pg_virtualenv.kLuOo5/log/postgresql-10-regress.log
07:34:27 
07:34:27 /usr/lib/postgresql/10/lib/pgxs/src/makefiles/../../src/test/regress/pg_regress --inputdir=./ --bindir='/usr/lib/postgresql/10/bin'    --dbname=contrib_regression 01_create_ext 02_setup 03_deploy 04_add_to_rep 05_tick 06_worker 07_handlers 08_reentrance 09_1_2_tests 99_cleanup
07:34:27 (using postmaster on localhost, port 5432)
07:34:27 ============== dropping database "contrib_regression" ==============
07:34:27 NOTICE:  database "contrib_regression" does not exist, skipping
07:34:27 DROP DATABASE
07:34:27 ============== creating database "contrib_regression" ==============
07:34:27 CREATE DATABASE
07:34:27 ALTER DATABASE
07:34:27 ============== running regression test queries        ==============
07:34:27 test 01_create_ext            ... ok
07:34:27 test 02_setup                 ... ok
07:34:27 test 03_deploy                ... ok
07:34:28 test 04_add_to_rep            ... ok
07:34:28 test 05_tick                  ... ok
07:34:28 test 06_worker                ... FAILED
07:34:43 test 07_handlers              ... ok
07:34:45 test 08_reentrance            ... ok
07:34:45 test 09_1_2_tests             ... ok
07:34:45 test 99_cleanup               ... ok
07:34:45 
07:34:45 =======================
07:34:45  1 of 10 tests failed. 
07:34:45 =======================
07:34:45 
07:34:45 The differences that caused some tests to fail can be viewed in the
07:34:45 file "/tmp/adt-run.E5AI3C/build.Y7b/pglogical-ticker-1.3.0/regression.diffs".  A copy of the test summary that you see
07:34:45 above is saved in the file "/tmp/adt-run.E5AI3C/build.Y7b/pglogical-ticker-1.3.0/regression.out".
07:34:45 
07:34:45 /usr/lib/postgresql/10/lib/pgxs/src/makefiles/pgxs.mk:277: recipe for target 'installcheck' failed
07:34:45 make: *** [installcheck] Error 1
07:34:45 *** /tmp/pg_virtualenv.kLuOo5/log/postgresql-10-regress.log (last 100 lines) ***
07:34:45 2018-12-16 06:34:25.245 UTC [4683] LOG:  listening on IPv6 address "::1", port 5432
07:34:45 2018-12-16 06:34:25.245 UTC [4683] LOG:  listening on IPv4 address "127.0.0.1", port 5432
07:34:45 2018-12-16 06:34:25.246 UTC [4683] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
07:34:45 2018-12-16 06:34:25.251 UTC [4684] LOG:  database system was shut down at 2018-12-16 06:34:25 UTC
07:34:45 2018-12-16 06:34:25.254 UTC [4683] LOG:  database system is ready to accept connections
07:34:45 2018-12-16 06:34:25.255 UTC [4690] LOG:  starting pglogical supervisor
07:34:45 2018-12-16 06:34:25.261 UTC [4692] [unknown]@postgres LOG:  manager worker [4692] at slot 0 generation 1 detaching cleanly
07:34:45 2018-12-16 06:34:25.268 UTC [4693] [unknown]@template1 LOG:  manager worker [4693] at slot 0 generation 2 detaching cleanly
07:34:45 2018-12-16 06:34:25.838 UTC [4694] [unknown]@[unknown] LOG:  incomplete startup packet
07:34:45 2018-12-16 06:34:27.943 UTC [4738] [unknown]@postgres LOG:  manager worker [4738] at slot 0 generation 3 detaching cleanly
07:34:45 2018-12-16 06:34:27.947 UTC [4742] [unknown]@contrib_regression LOG:  starting pglogical database manager for database contrib_regression
07:34:45 2018-12-16 06:34:28.128 UTC [4756] LOG:  pglogical_ticker worker initialized
07:34:45 2018-12-16 06:34:28.949 UTC [4757] [unknown]@template1 LOG:  manager worker [4757] at slot 1 generation 1 detaching cleanly
07:34:45 2018-12-16 06:34:41.147 UTC [4756] ERROR:  canceling statement due to user request
07:34:45 2018-12-16 06:34:41.149 UTC [4683] LOG:  worker process: pglogical_ticker worker (PID 4756) exited with exit code 1
07:34:45 2018-12-16 06:34:43.199 UTC [4765] LOG:  pglogical_ticker worker initialized
07:34:45 2018-12-16 06:34:43.200 UTC [4766] LOG:  pglogical_ticker worker initialized
07:34:45 2018-12-16 06:34:43.206 UTC [4764] FATAL:  database 9999999 does not exist
07:34:45 2018-12-16 06:34:43.208 UTC [4683] LOG:  worker process: pglogical_ticker worker (PID 4764) exited with exit code 1
07:34:45 2018-12-16 06:34:44.202 UTC [4766] ERROR:  canceling statement due to user request
07:34:45 2018-12-16 06:34:44.202 UTC [4765] ERROR:  canceling statement due to user request
07:34:45 2018-12-16 06:34:44.204 UTC [4683] LOG:  worker process: pglogical_ticker worker (PID 4766) exited with exit code 1
07:34:45 2018-12-16 06:34:44.204 UTC [4683] LOG:  worker process: pglogical_ticker worker (PID 4765) exited with exit code 1
07:34:45 2018-12-16 06:34:45.308 UTC [4774] postgres@contrib_regression ERROR:  cannot drop table pglogical_ticker.test1 because extension pglogical_ticker requires it
07:34:45 2018-12-16 06:34:45.308 UTC [4774] postgres@contrib_regression HINT:  You can drop extension pglogical_ticker instead.
07:34:45 2018-12-16 06:34:45.308 UTC [4774] postgres@contrib_regression STATEMENT:  DROP TABLE pglogical_ticker.test1;
07:34:45 2018-12-16 06:34:45.363 UTC [4742] [unknown]@contrib_regression LOG:  manager worker [4742] at slot 0 generation 4 detaching cleanly
07:34:45 Dropping cluster 10/regress ...
07:34:45 **** regression.diffs ****
07:34:45 --- /tmp/adt-run.E5AI3C/build.Y7b/pglogical-ticker-1.3.0/expected/06_worker.out    2018-11-13 18:43:54.000000000 +0000
07:34:45 +++ /tmp/adt-run.E5AI3C/build.Y7b/pglogical-ticker-1.3.0/results/06_worker.out 2018-12-16 06:34:43.149971160 +0000
07:34:45 @@ -47,14 +47,14 @@
07:34:45  SELECT COUNT(1) FROM worker_pid WHERE pid IS NOT NULL;
07:34:45   count 
07:34:45  -------
07:34:45 -     1
07:34:45 +     0
07:34:45  (1 row)
07:34:45  
07:34:45  SELECT pg_cancel_backend(pid)
07:34:45  FROM worker_pid;
07:34:45   pg_cancel_backend 
07:34:45  -------------------
07:34:45 - t
07:34:45 + 
07:34:45  (1 row)
07:34:45  
07:34:45  --Test it does nothing with no tables

https://pgdgbuild.dus.dg-i.net/job/pglogical-ticker-adt/3/architecture=ppc64el,distribution=jessie/console

There are two failures on trusty that look the same.

The stretch/amd64/9.6 failure is different:

07:31:05 Ver Cluster Port Status Owner    Data directory                             Log file
07:31:05 9.6 regress 5432 online postgres /tmp/pg_virtualenv.HgTEiv/data/9.6/regress /tmp/pg_virtualenv.HgTEiv/log/postgresql-9.6-regress.log
07:31:05 
07:31:05 /usr/lib/postgresql/9.6/lib/pgxs/src/makefiles/../../src/test/regress/pg_regress --inputdir=./ --bindir='/usr/lib/postgresql/9.6/bin'    --dbname=contrib_regression 01_create_ext 02_setup 03_deploy 04_add_to_rep 05_tick 06_worker 07_handlers 08_reentrance 09_1_2_tests 99_cleanup
07:31:05 (using postmaster on localhost, port 5432)
07:31:05 ============== dropping database "contrib_regression" ==============
07:31:05 NOTICE:  database "contrib_regression" does not exist, skipping
07:31:05 DROP DATABASE
07:31:05 ============== creating database "contrib_regression" ==============
07:31:05 CREATE DATABASE
07:31:05 ALTER DATABASE
07:31:05 ============== running regression test queries        ==============
07:31:05 test 01_create_ext            ... ok
07:31:05 test 02_setup                 ... ok
07:31:05 test 03_deploy                ... ok
07:31:06 test 04_add_to_rep            ... ok
07:31:06 test 05_tick                  ... ok
07:31:06 test 06_worker                ... ok
07:31:21 test 07_handlers              ... FAILED
07:31:23 test 08_reentrance            ... ok
07:31:23 test 09_1_2_tests             ... ok
07:31:23 test 99_cleanup               ... ok
07:31:23 
07:31:23 =======================
07:31:23  1 of 10 tests failed. 
07:31:23 =======================
07:31:23 
07:31:23 The differences that caused some tests to fail can be viewed in the
07:31:23 file "/tmp/autopkgtest.phfXoN/build.9s9/pglogical-ticker-1.3.0/regression.diffs".  A copy of the test summary that you see
07:31:23 above is saved in the file "/tmp/autopkgtest.phfXoN/build.9s9/pglogical-ticker-1.3.0/regression.out".
07:31:23 
07:31:23 make: *** [installcheck] Error 1
07:31:23 /usr/lib/postgresql/9.6/lib/pgxs/src/makefiles/pgxs.mk:274: recipe for target 'installcheck' failed
07:31:23 *** /tmp/pg_virtualenv.HgTEiv/log/postgresql-9.6-regress.log (last 100 lines) ***
07:31:23 2018-12-16 06:31:03.457 UTC [4792] LOG:  database system was shut down at 2018-12-16 06:31:03 UTC
07:31:23 2018-12-16 06:31:03.457 UTC [4792] LOG:  MultiXact member wraparound protections are now enabled
07:31:23 2018-12-16 06:31:03.462 UTC [4796] LOG:  autovacuum launcher started
07:31:23 2018-12-16 06:31:03.463 UTC [4791] LOG:  database system is ready to accept connections
07:31:23 2018-12-16 06:31:03.464 UTC [4798] LOG:  starting pglogical supervisor
07:31:23 2018-12-16 06:31:03.472 UTC [4799] [unknown]@postgres LOG:  manager worker [4799] at slot 0 generation 1 detaching cleanly
07:31:23 2018-12-16 06:31:03.482 UTC [4800] [unknown]@template1 LOG:  manager worker [4800] at slot 0 generation 2 detaching cleanly
07:31:23 2018-12-16 06:31:03.923 UTC [4801] [unknown]@[unknown] LOG:  incomplete startup packet
07:31:23 2018-12-16 06:31:05.973 UTC [4845] [unknown]@postgres LOG:  manager worker [4845] at slot 0 generation 3 detaching cleanly
07:31:23 2018-12-16 06:31:05.982 UTC [4846] [unknown]@contrib_regression LOG:  starting pglogical database manager for database contrib_regression
07:31:23 2018-12-16 06:31:06.269 UTC [4863] LOG:  pglogical_ticker worker initialized
07:31:23 2018-12-16 06:31:06.993 UTC [4864] [unknown]@template1 LOG:  manager worker [4864] at slot 1 generation 1 detaching cleanly
07:31:23 2018-12-16 06:31:19.267 UTC [4863] ERROR:  canceling statement due to user request
07:31:23 2018-12-16 06:31:19.271 UTC [4791] LOG:  worker process: pglogical_ticker worker (PID 4863) exited with exit code 1
07:31:23 2018-12-16 06:31:19.279 UTC [4865] LOG:  pglogical_ticker worker initialized
07:31:23 2018-12-16 06:31:21.279 UTC [4865] ERROR:  canceling statement due to user request
07:31:23 2018-12-16 06:31:21.283 UTC [4791] LOG:  worker process: pglogical_ticker worker (PID 4865) exited with exit code 1
07:31:23 2018-12-16 06:31:21.327 UTC [4870] FATAL:  database 9999999 does not exist
07:31:23 2018-12-16 06:31:21.329 UTC [4871] LOG:  pglogical_ticker worker initialized
07:31:23 2018-12-16 06:31:21.330 UTC [4791] LOG:  worker process: pglogical_ticker worker (PID 4870) exited with exit code 1
07:31:23 2018-12-16 06:31:22.335 UTC [4871] ERROR:  canceling statement due to user request
07:31:23 2018-12-16 06:31:22.338 UTC [4791] LOG:  worker process: pglogical_ticker worker (PID 4871) exited with exit code 1
07:31:23 2018-12-16 06:31:23.469 UTC [4882] postgres@contrib_regression ERROR:  cannot drop table pglogical_ticker.test1 because extension pglogical_ticker requires it
07:31:23 2018-12-16 06:31:23.469 UTC [4882] postgres@contrib_regression HINT:  You can drop extension pglogical_ticker instead.
07:31:23 2018-12-16 06:31:23.469 UTC [4882] postgres@contrib_regression STATEMENT:  DROP TABLE pglogical_ticker.test1;
07:31:23 2018-12-16 06:31:23.530 UTC [4846] [unknown]@contrib_regression LOG:  manager worker [4846] at slot 0 generation 4 detaching cleanly
07:31:23 Dropping cluster 9.6/regress ...
07:31:24 **** regression.diffs ****
07:31:24 --- /tmp/autopkgtest.phfXoN/build.9s9/pglogical-ticker-1.3.0/expected/07_handlers.out  2018-11-13 18:43:54.000000000 +0000
07:31:24 +++ /tmp/autopkgtest.phfXoN/build.9s9/pglogical-ticker-1.3.0/results/07_handlers.out   2018-12-16 06:31:23.335068802 +0000
07:31:24 @@ -26,7 +26,7 @@
07:31:24  SELECT pglogical_ticker.launch() IS NULL AS pid;
07:31:24   pid 
07:31:24  -----
07:31:24 - f
07:31:24 + t
07:31:24  (1 row)
07:31:24  
07:31:24  SELECT pg_sleep(1);

It would be nice if these tests were less racy. Thanks!

jfinzel commented 5 years ago

@ChristophBerg I have a PR to fix this here: https://github.com/enova/pglogical_ticker/pull/5

Can you test against this branch, or do you need me to tag a release first?

jfinzel commented 5 years ago

As noted, pushed up https://github.com/enova/pglogical_ticker/pull/5/commits/96a6d9c616257607928ef9309cfc8b9e78600cfa which I hope will fix this.

jfinzel commented 5 years ago

Fixed with https://github.com/enova/pglogical_ticker/pull/5