bucardo / check_postgres

Nagios check_postgres plugin for checking status of PostgreSQL databases
http://bucardo.org/wiki/Check_postgres
Other
561 stars 175 forks source link

PG13: same_schema needs porting #167

Open df7cb opened 5 years ago

df7cb commented 5 years ago

The same_schema check needs rewriting for PG12:

t/02_same_schema.t .. 1/76 
#   Failed test 'Action 'same_schema' succeeds with two empty databases'
#   at t/02_same_schema.t line 63.
#                   'ERROR:  column "proisagg" does not exist
# LINE 38: ...function_arguments(oid) )AS name FROM pg_proc WHERE proisagg
#                                                                 ^
# HINT:  Perhaps you meant to reference the column "pg_proc.prolang".
# '
#     doesn't match '(?^:^POSTGRES_SAME_SCHEMA OK)'
t/02_same_schema.t .. 3/76 
#   Failed test 'Action 'same_schema' reports on language differences'
#   at t/02_same_schema.t line 94.
#                   'ERROR:  column "proisagg" does not exist
# LINE 38: ...function_arguments(oid) )AS name FROM pg_proc WHERE proisagg
#                                                                 ^
# HINT:  Perhaps you meant to reference the column "pg_proc.prolang".
# '
#     doesn't match '(?^:^POSTGRES_SAME_SCHEMA OK)'

#   Failed test 'Action 'same_schema' reports language on 3 but not 1 and 2'
#   at t/02_same_schema.t line 98.
#                   'ERROR:  column "proisagg" does not exist
# LINE 38: ...function_arguments(oid) )AS name FROM pg_proc WHERE proisagg
#                                                                 ^
# HINT:  Perhaps you meant to reference the column "pg_proc.prolang".
# '
#     doesn't match '(?^s:^POSTGRES_SAME_SCHEMA CRITICAL.*Items not matched: 1 .*
# Language "plpgsql" does not exist on all databases:
# \s*Exists on:\s+3
# \s+Missing on:\s+1, 2\s*$)'
...

Additionally, t/02_same_schema.t uses WITH OIDS which is gone in PG12. I suggest using WITH (autovacuum_enabled = off) instead there.

df7cb commented 4 years ago

The same_schema test was fixed for PG12 in the meantime, but now PG13 is broken:

$ LC_ALL=C make test PGBINDIR=/usr/lib/postgresql/13/bin TEST_FILES=t/02_same_schema.t 
PERL_DL_NONLAZY=1 "/usr/bin/perl" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/02_same_schema.t
t/02_same_schema.t .. 2/76 commit ineffective with AutoCommit enabled at t/02_same_schema.t line 81.
commit ineffective with AutoCommit enabled at t/02_same_schema.t line 81.
commit ineffective with AutoCommit enabled at t/02_same_schema.t line 81.
t/02_same_schema.t .. 4/76 
#   Failed test 'Action 'same_schema' reports language on 3 but not 1 and 2'
#   at t/02_same_schema.t line 100.
#                   'POSTGRES_SAME_SCHEMA CRITICAL: DB "postgres" (hosts:/tmp/cptesting_socket,/tmp/cptesting_socket2,/tmp/cptesting_socket3) Databases were different. Items not matched: 4 | time=4.53s 
# DB 1: port=5432 host=/tmp/cptesting_socket dbname=postgres user=check_postgres_testing 
# DB 1: PG version: 13beta1
# DB 1: Total objects: 5102
# DB 2: port=5432 host=/tmp/cptesting_socket2 dbname=postgres user=check_postgres_testing 
# DB 2: PG version: 13beta1
# DB 2: Total objects: 5102
# DB 3: port=5432 host=/tmp/cptesting_socket3 dbname=postgres user=check_postgres_testing 
# DB 3: PG version: 13beta1
# DB 3: Total objects: 5106
# Language "plpgsql" does not exist on all databases:
#   Exists on:  3
#   Missing on: 1, 2
# Comment "extension;pg_catalog.plpgsql" does not exist on all databases:
#   Exists on:  3
#   Missing on: 1, 2
# Comment "language;plpgsql" does not exist on all databases:
#   Exists on:  3
#   Missing on: 1, 2
# Extension "plpgsql" does not exist on all databases:
#   Exists on:  3
#   Missing on: 1, 2
# Schema "extension;pg_catalog" does not exist on all databases:
#   Exists on:  
#   Missing on: 
#  
# '
#     doesn't match '(?^s:^POSTGRES_SAME_SCHEMA CRITICAL.*Items not matched: 1 .*
# Language "plpgsql" does not exist on all databases:
# \s*Exists on:\s+3
# \s+Missing on:\s+1, 2\s*$)'
t/02_same_schema.t .. 5/76 
#   Failed test 'Action 'same_schema' does not report language differences if the 'nolanguage' filter is given'
#   at t/02_same_schema.t line 108.
#                   'POSTGRES_SAME_SCHEMA CRITICAL: DB "postgres" (hosts:/tmp/cptesting_socket,/tmp/cptesting_socket2,/tmp/cptesting_socket3) Databases were different. Items not matched: 3 | time=4.47s 
# DB 1: port=5432 host=/tmp/cptesting_socket dbname=postgres user=check_postgres_testing 
# DB 1: PG version: 13beta1
# DB 1: Total objects: 5099
# DB 2: port=5432 host=/tmp/cptesting_socket2 dbname=postgres user=check_postgres_testing 
# DB 2: PG version: 13beta1
# DB 2: Total objects: 5099
# DB 3: port=5432 host=/tmp/cptesting_socket3 dbname=postgres user=check_postgres_testing 
# DB 3: PG version: 13beta1
# DB 3: Total objects: 5102
# Comment "extension;pg_catalog.plpgsql" does not exist on all databases:
#   Exists on:  3
#   Missing on: 1, 2
# Comment "language;plpgsql" does not exist on all databases:
#   Exists on:  3
#   Missing on: 1, 2
# Extension "plpgsql" does not exist on all databases:
#   Exists on:  3
#   Missing on: 1, 2
# Schema "extension;pg_catalog" does not exist on all databases:
#   Exists on:  
#   Missing on: 
#  
# '
#     doesn't match '(?^:^POSTGRES_SAME_SCHEMA OK)'

I've been staring that the code for some time, but can't make much sense of it. It seems the problem is that plpgsql isn't dropped properly in the 3rd test database, but inspecting the database manually didn't reveal any differences.

machack666 commented 4 years ago

Looks like the regexes aren't matching due to extra missing items being reported; i.e., the Comment "extension;pg_catalog.plpgsql" etc lines.

I assume that is output generated by the check_postgres script itself, so will dig in a bit and see what I can find there.

machack666 commented 4 years ago

I'm wondering is this is related to fallout/cleanup from postgresql's 50fc694e43742ce3d04a5e9f708432cb022c5f0d, perhaps some changes in pg_depend causing a different state after dropping a pl; since pg_pltemplate went away, comments, etc, might have originally belonged to that catalog.

df7cb commented 4 years ago

I did some more staring yesterday but didn't get very far. The first test difference is easily catched by accepting the extra 3 items (change "1" to "[14]", append /m to the regexp flags). What I didn't understand at all is the 2nd test difference: why is the difference no going away even after recreating the language (or extension, didn't matter) in the 3rd db.

machack666 commented 4 years ago

Looks like this will fix the tests; not sure if it's sane or not:

modified   check_postgres.pl
@@ -1310,6 +1310,7 @@ JOIN pg_roles r ON (r.oid = l.lanowner)},
         SQL2       => q{
 SELECT l.*, lanname AS name
 FROM pg_language l
+        exclude    => 'system',
     },
     },
     aggregate => {

However, based on the actual code which handles the system exclusion, I'm not sure that this is a sensible test/fix.

sub find_catalog_info {
...
    if (exists $ci->{exclude}) {
        if ('temp_schemas' eq $ci->{exclude}) {
            if (! $opt{filtered}{system}) {
                $SQL .= q{ WHERE nspname !~ '^pg_t'};
            }
        }
        elsif ('system' eq $ci->{exclude}) {
            if (! $opt{filtered}{system}) {
                $SQL .= sprintf
                    q{ %s n.nspname !~ '^pg' AND n.nspname <> 'information_schema'},
                        $SQL =~ /WHERE/ ? 'AND' : 'WHERE';
            }
        }
        else {
            die "Unknown exclude '$ci->{exclude}' called";
        }
    }
...
}

It does look like there is a lot of version-specific knowledge/fixes to the queries which pull the underlying catalog objects out, so this routine is probably where any such fix should go.