Doctorbal / zabbix-postgres-partitioning

Zabbix PostgreSQL version 11 Native Partitioning
MIT License
43 stars 15 forks source link

Issues with maintenance jobs #5

Closed mjtrangoni closed 4 years ago

mjtrangoni commented 4 years ago

Hi @Doctorbal,

I had some issues that I want to report to you, so you can improve this tutorial.

First of all, I noticed that the maintenance jobs does not run, and this seems so in PostgreSQL logs,

2019-08-20 21:33:02.563 CEST [25599] CONTEXT:  PL/pgSQL function partman.run_maintenance(text,boolean,boolean,boolean) line 398 at RAISE
        SQL statement "SELECT "partman".run_maintenance(p_analyze := false, p_jobmon := true)"
2019-08-20 21:33:02.565 CEST [30070] LOG:  background worker "pg_partman dynamic background worker (dbname=zabbix)" (PID 25599) exited with exit code 1
2019-08-20 22:33:02.639 CEST [21064] LOG:  pg_partman dynamic background worker (dbname=zabbix) dynamic background worker initialized with role zabbix on database zabbix
2019-08-20 22:33:02.707 CEST [21064] ERROR:  permission denied for table trends_p2020_08
        CONTEXT: SQL statement "SELECT max(to_timestamp(clock))::text FROM public.trends_p2020_08"
        PL/pgSQL function partman.run_maintenance(text,boolean,boolean,boolean) line 209 at EXECUTE
        SQL statement "SELECT "partman".run_maintenance(p_analyze := false, p_jobmon := true)"
        DETAIL:
        HINT:

So I noticed that the partition owner was postgres, and not zabbix. I think you should write that thing bold, so that I would got it.

Next point was, I granted more rights for the zabbix user, but I am not 100% positive if this all is required.

GRANT ALL ON SCHEMA partman TO zabbix;
GRANT ALL ON ALL TABLES IN SCHEMA partman TO zabbix;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA partman TO zabbix;
GRANT EXECUTE ON ALL PROCEDURES IN SCHEMA partman TO zabbix;  -- PG11+ only

Last but not least, the import of the history data without --inserts flag does not work at all, so that the data is imported, but not visible for the UI. Do you have any clues on this for me?

Thanks!

Doctorbal commented 4 years ago

@mjtrangoni ,

Thank you for bringing these issues up.

When you created the zabbix user what role permissions did you provide it?

Also to understand your environment better, what version PostgreSQL are you running on which OS distro?

In my tutorial, the zabbix database user has superuser privileges (via the -s parameter).

If you created the partman SCHEMA in the zabbix database, if the zabbix user is the owner of that database, running the maintenance scripts should work in whatever SCHEMA used on that database.

Your SQL commands look appropriate and should be enough to grant the zabbix user permissions on the partman SCHEMA.

GRANT ALL ON SCHEMA partman TO zabbix;
GRANT ALL ON ALL TABLES IN SCHEMA partman TO zabbix;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA partman TO zabbix;
GRANT EXECUTE ON ALL PROCEDURES IN SCHEMA partman TO zabbix;

Regarding the --inserts parameter, let me know if the following below works instead.

# Note that the following commands are run on the new DB instance...
$ sudo mkdir -p /var/backups/postgresql
$ sudo chown -R postgres:postgres /var/backups/postgresql
$ sudo -u postgres pg_dump -Fc --file=/var/backups/postgresql/zabbix-history.dump -d zabbix -h <OLD-`zabbix`-DATABASE> -U zabbix --table=history*
$ sudo -u postgres pg_restore -j 8 -d zabbix /var/backups/postgresql/zabbix-history.dump -U zabbix

I will simplify the docs from your feedback as well as when I get to testing these use cases more appropriately.

If you have any more issues that address specific details that I might of missed please mention them as it can get quite confusing to the outside viewer.

Doctorbal commented 4 years ago

@mjtrangoni ,

I was able to replicate the issue regarding permission denied for the zabbix user on the maintenance procedure.

You are correct; the zabbix database tables must be owned by the zabbix user if that user is not a superuser.

When I performed the procedure the zabbix user was a superuser, thus I didn't run into that issue.

You can fix your permission issue with the following shell script (I called it change-zbxdb-table-ownership.sh):

#!/bin/bash
# Description: Changes the database to the specified database user.
#              Run as postgres user; i.e. superuser of postgres DB.
# Usage: ./change-zbxdb-table-ownership.sh zabbix zabbix

DATABASE=$1
NEW_OWNER=$2

# test that there are two arguments
if test $# -lt 2; then
  echo "usage: $0 <DATABASE> <NEW_OWNER>"
  exit 0
fi

tables=`psql -qAt -c "select tablename from pg_tables where schemaname = 'public';" $DATABASE`

for tbl in $tables ; do
  psql -c "alter table $tbl owner to $NEW_OWNER" $DATABASE ;
done

Then run it as the postgres user:

$ sudo chmod +x /var/lib/postgresql/change-zbxdb-table-ownership.sh
$ sudo -u postgres /var/lib/postgresql/change-zbxdb-table-ownership.sh zabbix zabbix

I will push an update to the docs soon to address this as well as additional fixes.

mjtrangoni commented 4 years ago

Hi @Doctorbal,

Thanks for updating the tutorial. I am running the latest PSQL 11 from the official RPM repository on CentOS 7.6.

I would add to the troubleshooting part, that you have to disable the partman maintenance first for assuring that the history_(*)_default are not too big.

UPDATE partman.part_config SET automatic_maintenance = 'off' WHERE parent_table = 'public.history';
[...]

This was the biggest problem that I had/still have. I imported the history without the --inserts command, and as a result, I've got all the data imported as history_default and history_uint_default. Those tables were so big, that I had to remove that data, and wait until they were vacuumed, and only then, I could run the first maintenance manually, and then set this back to automatic_maintenance.

Of course, this depends on how quickly you find out those permission problems. My case a week, which was a long period.

By the way, which is the recommended form of importing the old history data? For me, the COPY way was not successful.

Doctorbal commented 4 years ago

@mjtrangoni ,

Thanks for the details on your environment.

I'll go ahead and add your issue in the troubleshooting section although I don't necessary see this as a problem. You should, generally speaking not retain history table data longer than a few days as the zabbix build-in historical data mechanism then kicks in; aka trends!

This is also suggested in the zabbix documentation. The history data (numeric float in this case) should be keep for the smallest possible number of days. I started using 7 days and find it plenty for my needs.

I would, for example, recommend to delete history data older than 14 days as a good starting point then tune your partitions as you see fit; there is no real need to store numeric (float) data longer than a few days as trends takes over. That way when you pg_dump and pg_restore the data it won't take a long time to filtrate into the appropriate partitions.

delete FROM history where age(to_timestamp(history.clock)) > interval '14 days';

How long do you want to retain old history data for? And why if longer than a few days?

I haven't had trouble using the COPY method on either the history* or trends* tables in my past experiences. I actually recommend that method, simply based on my past experience with it.

I will update the documentation with both options. I find the COPY method just fine for my use cases.

@mjtrangoni, please let me know if you have any more questions, otherwise I will go ahead and lose this issue.