Doctorbal / zabbix-postgres-partitioning

Zabbix PostgreSQL version 11 Native Partitioning
MIT License
43 stars 15 forks source link

Question: Fix history_log tables after pg_partman bug #11

Closed mjtrangoni closed 4 years ago

mjtrangoni commented 4 years ago

Hi @Doctorbal,

I realized today, that I was facing this bug only on one Zabbix instance. And every time the partition maintenance is running on history_log I get this error,

zabbix=> SELECT partman.run_maintenance('public.history_log');
WARNING:  Attempt to drop final partition in partition set public.history_log as part of retention policy. If you see this message multiple times for the same table, advise reviewing retention policy and/or data entry into the partition set. Also consider setting "infinite_time_partitions = true" if there are large gaps in data insertion.).
 run_maintenance 
-----------------

(1 row)

Do you have any clue on how should I fix this correctly? Should I set the infinite_time_partitions to true? See,

zabbix=> SELECT nspname || '.' || relname AS "relation", pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size" FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND C.relkind <> 'i' AND nspname !~ '^pg_toast' ORDER BY pg_total_relation_size(C.oid) DESC;
               relation               | total_size 
--------------------------------------+------------
 public.history_log_default           | 3375 MB
 public.history_uint_p2020_04_25      | 1665 MB
 public.history_uint_p2020_04_26      | 1665 MB
 public.history_uint_p2020_04_24      | 1665 MB
 public.history_uint_p2020_04_30      | 1664 MB

Thanks!

Doctorbal commented 4 years ago

Hello @mjtrangoni ,

I have not experienced this issue before.

Setting infinite_time_partitions to true should not be needed unless you do not have premake values set for automatic partition creation. This was mentioned by the pg_partman developor in the following comment here.

From your history_uint table I see gaps from dates 04-27-2020 to 04-29-2020. Is that intentional from the output you are providing in this inquiry?

From the warning message it appears you are dropping a partition table that falls within the retention period. Why is that? Are you running out of disk space?

Also can you answer the following questions:

If you provide me more information around your setup I can see if I can reproduce it, otherwise I suggest submitting a bug request to the developer of the plugin if you indeed have found an issue.

Thanks & Best Regards, Andreas

mjtrangoni commented 4 years ago

Hi @Doctorbal,

Thanks for your quick response. About history_int, the partitions are there, but sorted by size, and it varies a very little bit. See,

zabbix=> SELECT nspname || '.' || relname AS "relation", pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size" FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND C.relkind <> 'i' AND nspname !~ '^pg_toast' ORDER BY pg_total_relation_size(C.oid) DESC;
               relation               | total_size 
--------------------------------------+------------
 public.history_log_default           | 3402 MB
 public.history_uint_p2020_04_25      | 1665 MB
 public.history_uint_p2020_04_26      | 1665 MB
 public.history_uint_p2020_04_24      | 1665 MB
 public.history_uint_p2020_04_30      | 1664 MB
 public.history_uint_p2020_04_29      | 1663 MB
 public.history_uint_p2020_04_27      | 1663 MB
 public.history_uint_p2020_04_28      | 1662 MB

I am not running out of space,

/dev/mapper/zabbixdb-lv_zabbixdb  1.0T  162G  862G  16% /var/lib/pgsql/11

About the other questions,

# cat /etc/redhat-release 
CentOS Linux release 7.8.2003 (Core)
# rpm -q postgresql11
postgresql11-11.7-1PGDG.rhel7.x86_64
# rpm -q pg_partman11
pg_partman11-4.3.0-1.rhel7.x86_64

zabbix=> SELECT * from partman.part_config;
    parent_table     | control | partition_type | partition_interval | constraint_cols | premake | optimize_trigger | optimize_constraint |  epoch  | inherit_fk | retention | retention_schema | retention_keep_table | retention_keep
_index | infinite_time_partitions | datetime_string | automatic_maintenance | jobmon | sub_partition_set_full | undo_in_progress | trigger_exception_handling | upsert | trigger_return_null |            template_table            | p
ublications | inherit_privileges | constraint_valid 
---------------------+---------+----------------+--------------------+-----------------+---------+------------------+---------------------+---------+------------+-----------+------------------+----------------------+---------------
-------+--------------------------+-----------------+-----------------------+--------+------------------------+------------------+----------------------------+--------+---------------------+--------------------------------------+--
------------+--------------------+------------------
 public.history      | clock   | native         | 1 day              |                 |      30 |                4 |                  30 | seconds | t          | 30 day    |                  | f                    | t             
       | f                        | YYYY_MM_DD      | on                    | t      | f                      | f                | f                          |        | t                   | partman.template_public_history      |  
            | f                  | t
 public.history_uint | clock   | native         | 1 day              |                 |      30 |                4 |                  30 | seconds | t          | 30 day    |                  | f                    | t             
       | f                        | YYYY_MM_DD      | on                    | t      | f                      | f                | f                          |        | t                   | partman.template_public_history_uint |  
            | f                  | t
 public.history_str  | clock   | native         | 1 day              |                 |      30 |                4 |                  30 | seconds | t          | 30 day    |                  | f                    | t             
       | f                        | YYYY_MM_DD      | on                    | t      | f                      | f                | f                          |        | t                   | partman.template_public_history_str  |  
            | f                  | t
 public.history_text | clock   | native         | 1 day              |                 |      30 |                4 |                  30 | seconds | t          | 30 day    |                  | f                    | t             
       | f                        | YYYY_MM_DD      | on                    | t      | f                      | f                | f                          |        | t                   | partman.template_public_history_text |  
            | f                  | t
 public.trends       | clock   | native         | 1 mon              |                 |      12 |                4 |                  30 | seconds | t          | 12 month  |                  | f                    | t             
       | f                        | YYYY_MM         | on                    | t      | f                      | f                | f                          |        | t                   | partman.template_public_trends       |  
            | f                  | t
 public.trends_uint  | clock   | native         | 1 mon              |                 |      12 |                4 |                  30 | seconds | t          | 12 month  |                  | f                    | t             
       | f                        | YYYY_MM         | on                    | t      | f                      | f                | f                          |        | t                   | partman.template_public_trends_uint  |  
            | f                  | t
 public.history_log  | clock   | native         | 1 day              |                 |      30 |                4 |                  30 | seconds | t          | 30 day    |                  | f                    | t             
       | f                        | YYYY_MM_DD      | on                    | t      | f                      | f                | f                          |        | t                   | partman.template_public_history_log  |  
            | f                  | t

I started here with CentOS7.6 and pg_partman11-4.1.x, so that is why I think I ran on this bug. On the other 10x bigger instance that I have with a similar setup, it does not happens!

My question now is, how should I fix this so that it works partitioned again? Can I simply drop history_log table, and create it again? If yes, can you please list the steps?

Thanks for your support!

BTW, should your recipe also work with PSQL12 out-of-the-box? If yes, I can give it a try.

Best Regards,

Mario

Doctorbal commented 4 years ago

@mjtrangoni ,

I believe you did run into this bug, as the release notes for pg_partman version 4.2 addresses the following fix:

Fixed bug that caused an error in the retention management of time-based partitioning sets. If retention would attempt to remove the last child table, maintenance would throw an error about a missing table instead of just a warning about the last table attempting to be dropped.

You postgres version is fine; same with pg_partman 4.3.0. I use that version with no problems. That is the latest supported version for apt packages. The developer has version 4.3.1 with following CHANGELOG. There is no need to upgrade to the minor version unless needed.

How should you fix history_log?

I believe (please, please test this beforehand to ensure this is 100% accurate on your production instance), my procedure Change Zabbix history tables from monthly to daily with pg_partman can be conceptually used here to solve your problem with no data loss. The only caveat is I recommend stopping the Zabbix server and UI from writing to the database unless you have a replica set, HA solution, etc. That is out of the scope of this topic for now.

Essentially for history_log the procedure would go as follows:

  1. Back up your database.
  2. Stop the Zabbix UI and Server processes from writing to the database (unless you have an HA solution in place).
  3. Ensure all SQL commands are run as zabbix user: $ sudo -u zabbix psql zabbix.
  4. Stop pg_partman from running the dynamic background worker to perform table maintenance on the history_log* tables in thepartman.part_config` column:
    UPDATE partman.part_config SET automatic_maintenance = 'off' WHERE parent_table = 'public.history_log';
  5. Create a table similar to the one being unpartitioned so you don't lose the data.
    
    -- history_log_moved
    CREATE TABLE public.history_log_moved
    (
    itemid bigint NOT NULL,
    clock integer NOT NULL DEFAULT 0,
    "timestamp" integer NOT NULL DEFAULT 0,
    source character varying(64) COLLATE pg_catalog."default" NOT NULL DEFAULT ''::character varying,
    severity integer NOT NULL DEFAULT 0,
    value text COLLATE pg_catalog."default" NOT NULL DEFAULT ''::text,
    logeventid integer NOT NULL DEFAULT 0,
    ns integer NOT NULL DEFAULT 0
    ) PARTITION BY RANGE (clock);

CREATE INDEX history_log_moved_1 ON public.history_log_moved USING BRIN (itemid, clock);

6. Partition that new table:
```sql
SELECT partman.create_parent('public.history_log_moved', 'clock', 'native', 'monthly', null, 1, 'on', null, true, 'seconds');
  1. Call the partman.undo_partition_proc() function on the table wanting to be unpartitioned. This will insert the data into the newly created tables in the previous steps. This seems to lock the table and you can't view any information in the frontend (hence a reason why the frontend should be stopped from writing to the DB):
    CALL partman.undo_partition_proc('public.history_log', '1 day', null, 3, 'public.history_log_moved', false, 0, 10, false);

    then

    VACUUM ANALYZE history_log;
  2. Create the partitioned tables on the original history_log table. Use your specified date range here! (note 30 partitions):
    SELECT partman.create_parent('public.history_log', 'clock', 'native', 'daily', null, 30, 'on', null, true, 'seconds');
  3. INSERT the data back into the newly partitioned table:
    INSERT INTO public.history_log SELECT * FROM public.history_log_moved
  4. Drop the old table and remove the partman.part_config for that old table:
    DROP TABLE history_log_moved;
    DELETE FROM partman.part_config WHERE parent_table = 'public.history_log_moved';
  5. Update the partman.part_config for public.history:
    UPDATE partman.part_config SET automatic_maintenance = 'on', retention_keep_table = false, retention = '8 day' WHERE parent_table = 'public.history_log';
  6. Run the maintenance:
    SELECT partman.run_maintenance('public.history_log');
  7. Verify partman.part_config:
    SELECT * FROM partman.part_config;
  8. Run VACUUM ANALYZE:
    VACUUM ANALYZE history_log;
  9. Start your Zabbix server/UI again.

Again, please test. If it works for you I can add to the documentation as a solution.

Should my recipe also work with PSQL12 out-of-the-box?

Yes! Absolutely! It will work even better for PostgreSQL version 12 out-of-the-box.

I do know that Zabbix offers TimescaleDB support since version 4.2. I run LTS version 4.0 and haven't tested that yet. But pg_partman has been working very well for me thus far without problems.

Let me know how it goes.

Best Regards, Andreas

P.S. Note I verified with the developer the procedure of unpartitioning and repartitioning here.

mjtrangoni commented 4 years ago

Hi @Doctorbal,

Sorry for the delay. After analyzing the pro and const, I decided moving to CentOS8.x, PSQL12.x, and Zabbix5.0. See my other issue #13 on this topic.

And I am also cautious on TimescaleDB, and don't trust it for the moment.

Doctorbal commented 4 years ago

@mjtrangoni ,

Thank you for the update and a good decision; please test extensively first before migrating production.

In the Zabbix Webinars I attended recently they mentioned that TimescaleDB, although very promising with performance, is still experimental! Thus I wouldn't suggest to use it unless you have paid Zabbix support.

I will go ahead and close this issue. Feel free to re-open it if you need more information.

Thanks & Best Regards, Andreas