Doctorbal / zabbix-postgres-partitioning

Zabbix PostgreSQL version 11 Native Partitioning
MIT License
43 stars 15 forks source link

Partitioning to Timescale #21

Closed vanimesh closed 1 year ago

vanimesh commented 1 year ago

Hi Dr,

First of all thank you for creating such a good documentation for partitioning and always helping us when in trouble with very less response time :-).

We have zabbix 4.0 with postgres and partitioinng. We are planning to upgrade the instance to 6.0 and move to timescale DB. But I am really confused how to proceed. I was just thinking to stop the database and grab the data from partitions(trends and history) into a backup table and remove all the partitions and stop partitioning. But that would be very hectic and lot of downtime. Can you please suggest the better way to do it ?

Doctorbal commented 1 year ago

@vanimesh ,

What you want to do is also something I am considering.

In my opinion (I have not done the research on this yet but am hypothesizing here...) I would consider leveraging logical replication from our on-premise PostgreSQL v13 partitioned instance using these docs to the Azure Flexible Server method - reference.

This reduces downtime and my opinion here is the database logical replication simply migrates the SQL data into another database and it doesn't care how that database is set up. It would simply dump the data into the relevant partitions based on the database set up for timescaleDB.

The single most important aspect of Zabbix is the database.

In my opinion, this will be possible with the least amount of downtime.

Again, I have not tested this but this is what I would research and try.

I would be very curious to see what you do if your time permits you do work on this.

Zabbix 6.x comes with tons of improvements which I really want to leverage but time and priorities has been unfortunately a factor that hasn't allowed me to look into this at all.

Let me know how you proceed please.

Regards, Andreas

vanimesh commented 1 year ago

Hi @Doctorbal , we are doing everything on-premise. So we are thinking of below.

First we take the backup of all the data from partitions(history and trend) into a backup table.

Approach1

  1. Setup a new instance of 6.0
  2. Migrate the meta data from 4.0 to 6.0
  3. Migrate the history and trend data into the actual tables in v6.0
  4. Then apply the timescale

Approach2

  1. Stop partitioning(please help us with the proper process) in existing 4.0 instance
  2. Then dump all the data from history and trends partitions into actual tables.
  3. Then upgrade zabbix to 6.0
  4. Apply timescale.
Doctorbal commented 1 year ago

@vanimesh ,

I believe Approach1 sounds sensible. Sounds like you are targeting downtime with that or will you use logical replication?

For Approach2 to stop partitioning ensure you configure the partman.part_config as mentioned in this section; e.g.

UPDATE partman.part_config SET automatic_maintenance = 'off' WHERE parent_table = 'public.history';
UPDATE partman.part_config SET automatic_maintenance = 'off' WHERE parent_table = 'public.history_uint';
UPDATE partman.part_config SET automatic_maintenance = 'off' WHERE parent_table = 'public.history_str';
UPDATE partman.part_config SET automatic_maintenance = 'off' WHERE parent_table = 'public.history_log';
UPDATE partman.part_config SET automatic_maintenance = 'off' WHERE parent_table = 'public.history_text';

Let me know how it goes.

vanimesh commented 1 year ago

Hi @Doctorbal , yeah we are planning to do it with a downtime. But what is the logical replication, this is something new I am hearing of.

Doctorbal commented 1 year ago

@vanimesh , the following addresses concepts of logical replication - https://www.postgresql.org/docs/current/logical-replication.html

I suggest to look into using pglogical.

vanimesh commented 1 year ago

Hi @Doctorbal , we are doing everything on-premise. So we are thinking of below.

First we take the backup of all the data from partitions(history and trend) into a backup table.

Approach1

  1. Setup a new instance of 6.0
  2. Migrate the meta data from 4.0 to 6.0
  3. Migrate the history and trend data into the actual tables in v6.0
  4. Then apply the timescale

Approach2

  1. Stop partitioning(please help us with the proper process) in existing 4.0 instance
  2. Then dump all the data from history and trends partitions into actual tables.
  3. Then upgrade zabbix to 6.0
  4. Apply timescale.

Hi @Doctorbal , I followed approach1 but with little change as below.

  1. created a database zabbix_db
  2. imported the metadata from 4.0 to 6.0
  3. Started the zabbix server which automatically upgraded 4.0 database to 6.0 Everything seems fine but how would I remove the partitioning? I want to apply timescale for the same. If I remove the partitioned tables zabbix-server gives error. @Doctorbal , I attached the image. Can you please help me to remove partitioning so that I can proceed with timescaleDB ? :-)

image

vanimesh commented 1 year ago

Hi @Doctorbal , can you please help me here ?

Doctorbal commented 1 year ago

@vanimesh ,

Apologies for my delayed response.

You would need to use the partman.undo_partition_proc() function to unpartition the table into another one as outlined in my example mentioned here.

I would carefully review that section and also read up on the official partman plugin details here and consequently ask the original developer of the extension for insight.

From my experience in the past (a few years ago...), to unpartition the data I would suggest to test the following on a dev instance first and see how the process goes... Note that I could be wrong with this procedure as I have not tested it....

The example below focuses on the history partitions:

  1. (optional) Delete data older than 7 days.
/* The following is just an example of an epoch timestamp 7 days out from initiall creating this procedure - https://www.epochconverter.com/ ... */
delete FROM history where age(to_timestamp(history.clock)) > interval '7 days';
delete FROM history_uint where age(to_timestamp(history_uint.clock)) > interval '7 days' ;
delete FROM history_str where age(to_timestamp(history_str.clock)) > interval '7 days' ;
delete FROM history_log where age(to_timestamp(history_log.clock)) > interval '7 days' ;
delete FROM history_text where age(to_timestamp(history_text.clock)) > interval '7 days' ;
  1. Stop pg_partman from running the dynamic background worker to perform table maintenance on the history* tables in the partman.part_config column used by the maintenance
UPDATE partman.part_config SET automatic_maintenance = 'off' WHERE parent_table = 'public.history';
UPDATE partman.part_config SET automatic_maintenance = 'off' WHERE parent_table = 'public.history_uint';
UPDATE partman.part_config SET automatic_maintenance = 'off' WHERE parent_table = 'public.history_str';
UPDATE partman.part_config SET automatic_maintenance = 'off' WHERE parent_table = 'public.history_log';
UPDATE partman.part_config SET automatic_maintenance = 'off' WHERE parent_table = 'public.history_text';
  1. Create a table similar to the one being unpartitioned. For e.g.
-- history_moved
CREATE TABLE public.history_moved
(
    itemid bigint NOT NULL,
    clock integer NOT NULL DEFAULT 0,
    value numeric(16,4) NOT NULL DEFAULT 0.0000,
    ns integer NOT NULL DEFAULT 0
) PARTITION BY RANGE (clock);

CREATE INDEX history_moved_1 ON public.history_moved USING BRIN (itemid, clock);

-- history_uint_moved
CREATE TABLE public.history_uint_moved
(
    itemid bigint NOT NULL,
    clock integer NOT NULL DEFAULT 0,
    value numeric(20,0) NOT NULL DEFAULT (0)::numeric,
    ns integer NOT NULL DEFAULT 0
) PARTITION BY RANGE (clock);

CREATE INDEX history_uint_moved_1 ON public.history_uint_moved USING BRIN (itemid, clock);

-- history_str_moved
CREATE TABLE public.history_str_moved
(
    itemid bigint NOT NULL,
    clock integer NOT NULL DEFAULT 0,
    value character varying(255) COLLATE pg_catalog."default" NOT NULL DEFAULT ''::character varying,
    ns integer NOT NULL DEFAULT 0
) PARTITION BY RANGE (clock);

CREATE INDEX history_str_moved_1 ON public.history_str_moved USING BRIN (itemid, clock);

-- history_log_moved
CREATE TABLE public.history_log_moved
(
    itemid bigint NOT NULL,
    clock integer NOT NULL DEFAULT 0,
    "timestamp" integer NOT NULL DEFAULT 0,
    source character varying(64) COLLATE pg_catalog."default" NOT NULL DEFAULT ''::character varying,
    severity integer NOT NULL DEFAULT 0,
    value text COLLATE pg_catalog."default" NOT NULL DEFAULT ''::text,
    logeventid integer NOT NULL DEFAULT 0,
    ns integer NOT NULL DEFAULT 0
) PARTITION BY RANGE (clock);

CREATE INDEX history_log_moved_1 ON public.history_log_moved USING BRIN (itemid, clock);

-- history_text_moved
CREATE TABLE public.history_text_moved
(
    itemid bigint NOT NULL,
    clock integer NOT NULL DEFAULT 0,
    value text COLLATE pg_catalog."default" NOT NULL DEFAULT ''::text,
    ns integer NOT NULL DEFAULT 0
) PARTITION BY RANGE (clock);

CREATE INDEX history_text_moved_1 ON public.history_text_moved USING BRIN (itemid, clock);
  1. Call the partman.undo_partition_proc() function on the table wanting to be unpartitioned. This will insert the data into the newly created tables in the previous steps. This seems to lock the table and you can't view any information in the frontend (hence a reason why the frontend should be stopped from writing to the DB):
CALL partman.undo_partition_proc('public.history', '1 day', null, 1, 'public.history_moved', false, 0, 10, false);
CALL partman.undo_partition_proc('public.history_uint', '1 day', null, 3, 'public.history_uint_moved', false, 0, 10, false);
CALL partman.undo_partition_proc('public.history_str', '1 day', null, 3, 'public.history_str_moved', false, 0, 10, false);
CALL partman.undo_partition_proc('public.history_log', '1 day', null, 3, 'public.history_log_moved', false, 0, 10, false);
CALL partman.undo_partition_proc('public.history_text', '1 day', null, 3, 'public.history_text_moved', false, 0, 10, false);

VACUUM ANALYZE history;
VACUUM ANALYZE history_uint;
VACUUM ANALYZE history_str;
VACUUM ANALYZE history_log;
VACUUM ANALYZE history_text;
  1. From here I'm not 100% sure but would guess that the original history* tables are now fully unpartitioned so you can move all the data back into those tables:
-- NOTE THE EPOCH TIMESTAMP NEED CHANGING!!!
INSERT INTO public.history SELECT * FROM public.history_moved WHERE clock > 1549168074;
INSERT INTO public.history_uint SELECT * FROM public.history_uint_moved WHERE clock > 1549168074;
INSERT INTO public.history_str SELECT * FROM public.history_str_moved WHERE clock > 1549168074;
INSERT INTO public.history_log SELECT * FROM public.history_log_moved WHERE clock > 1549168074;
INSERT INTO public.history_text SELECT * FROM public.history_text_moved WHERE clock > 1549168074;
  1. Then delete the old tables and remove the partman.part_config column.
DROP TABLE history_moved;
DELETE FROM partman.part_config WHERE parent_table = 'public.history_moved';
DROP TABLE history_uint_moved;
DELETE FROM partman.part_config WHERE parent_table = 'public.history_uint_moved';
DROP TABLE history_str_moved;
DELETE FROM partman.part_config WHERE parent_table = 'public.history_str_moved';
DROP TABLE history_log_moved;
DELETE FROM partman.part_config WHERE parent_table = 'public.history_log_moved';
DROP TABLE history_text_moved;
DELETE FROM partman.part_config WHERE parent_table = 'public.history_text_moved';

Again, you are the pioneer here trying this out so I am not 100% certain but conceptually this is what I would consider when mapping out the game plan.

  1. Carefully read the docs on unpartitioning and see if anything has improved or changed by the developer supporting the plugin options for easier unpartitioning process.
  2. Run the tests against a dev database and write out a playbook as well as relevant scripts to ensure the process is seamless.
  3. Schedule downtime for production if and when the dev procedure looks appropriate.
  4. Execute during off hours. Assume this might take a while.

Unfortunately I have no experience with timescaledb so I cannot provide insight on how to proceed with that tool. I would suggest to leverage the Zabbix forum for you inquiry on using that - https://www.zabbix.com/forum/.

Best Regards, Andreas

Doctorbal commented 1 year ago

Closing issue due to inactivity.