citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.53k stars 667 forks source link

Cluster restore scenarios #6706

Open glushakov opened 1 year ago

glushakov commented 1 year ago

Hi. I did not find information in the documentation about scenarios for restoring a cluster from a backup.

I consider 2 main scenarios for the case of restoring from a backup.

  1. A logical data corruption that would require a full restore of all nodes to point in time (PITR) from a physical copy (for example pgbackrest) Steps:

    • stop coordinator, workers.
    • restore data on all nodes at one point in time. Are there any changes to the metadata on the coordinator?
  2. Loss of one of the workers. Steps:

    • stop the cluster
    • create a new worker
    • restore data from a physical backup to the last available point in time from a backup of a lost worker. Are there any changes to the metadata on the coordinator?

Are my steps correct?

Is it possible that the data on the workers after restoring from a backup will not be consistent with the metadata on the coordinator (for example, shardminvalue / shardmaxvalue or others)? Maybe some other advice? Thank you

glushakov commented 1 year ago

My observations when using pg_dump citus-db1 - coordinator citus-db2 - worker citus-db3 - worker

[citus-db1:]:~$ create table test (id int);
[citus-db1:]:~$ SELECT create_distributed_table('test','id');
[citus-db1:]:~$ insert into test (select * from generate_series(1,1000000));

Try to use pg_dump to dump the worker database.

[citus-db3:]:~$ pg_dump -d citus -f citus.sql
pg_dump: warning: could not resolve dependency loop among these items:
pg_dump:   ACCESS METHOD columnar  (ID 1876 OID 19204)
pg_dump:   INDEX options_pkey  (ID 3432 OID 19182)
pg_dump:   CONSTRAINT options_pkey  (ID 3433 OID 19183)
pg_dump:   POST-DATA BOUNDARY  (ID 3642)
pg_dump:   TABLE DATA test  (ID 3640 OID 20026)
pg_dump:   PRE-DATA BOUNDARY  (ID 3641)
pg_dump: warning: could not resolve dependency loop among these items:
pg_dump:   ACCESS METHOD columnar  (ID 1876 OID 19204)
pg_dump:   INDEX stripe_first_row_number_idx  (ID 3434 OID 19189)
pg_dump:   CONSTRAINT stripe_first_row_number_idx  (ID 3435 OID 19190)
pg_dump:   POST-DATA BOUNDARY  (ID 3642)
pg_dump:   TABLE DATA test  (ID 3640 OID 20026)
pg_dump:   PRE-DATA BOUNDARY  (ID 3641)
pg_dump: warning: could not resolve dependency loop among these items:
pg_dump:   ACCESS METHOD columnar  (ID 1876 OID 19204)
pg_dump:   INDEX chunk_group_pkey  (ID 3438 OID 19194)
pg_dump:   CONSTRAINT chunk_group_pkey  (ID 3439 OID 19195)
pg_dump:   POST-DATA BOUNDARY  (ID 3642)
pg_dump:   TABLE DATA test  (ID 3640 OID 20026)
pg_dump:   PRE-DATA BOUNDARY  (ID 3641)
pg_dump: warning: could not resolve dependency loop among these items:
pg_dump:   ACCESS METHOD columnar  (ID 1876 OID 19204)
pg_dump:   INDEX chunk_pkey  (ID 3440 OID 19201)
pg_dump:   CONSTRAINT chunk_pkey  (ID 3441 OID 19202)
pg_dump:   POST-DATA BOUNDARY  (ID 3642)
pg_dump:   TABLE DATA test  (ID 3640 OID 20026)
pg_dump:   PRE-DATA BOUNDARY  (ID 3641)
pg_dump: warning: could not resolve dependency loop among these items:
pg_dump:   ACCESS METHOD columnar  (ID 1876 OID 19204)
pg_dump:   INDEX stripe_pkey  (ID 3436 OID 19187)
pg_dump:   CONSTRAINT stripe_pkey  (ID 3437 OID 19188)
pg_dump:   POST-DATA BOUNDARY  (ID 3642)
pg_dump:   TABLE DATA test  (ID 3640 OID 20026)
pg_dump:   PRE-DATA BOUNDARY  (ID 3641)

warnings appear during the dump process. Whether this has consequences is unknown.

Drop the database on worker and try to restore from the dump.

[citus-db3:]:~$ drop database citus WITH ( FORCE );

Restore:
[citus-db3:]:~$ psql -d citus -f citus.sql

Check:
citus=# \dt+
                                  List of relations
 Schema | Name | Type  |  Owner   | Persistence | Access method | Size  | Description
--------+------+-------+----------+-------------+---------------+-------+-------------
 public | test | table | postgres | permanent   | heap          | 35 MB |
(1 row)

The table recovered as "simple", not sharded. When accessing it through the coordinator, we get an error:

[citus-db1:]:~$ select * from test limit 100;
ERROR:  relation "public.test_102009" does not exist
CONTEXT:  while executing command on citus-db3:5432

Suppose we decide that the table can be deleted in order to re-upload:

[citus-db1:]:~$ drop table test;
ERROR:  relation test is not distributed
CONTEXT:  while executing command on citus-db3:5432
SQL statement "SELECT master_remove_distributed_table_metadata_from_workers(v_obj.objid, v_obj.schema_name, v_obj.object_name)"
PL/pgSQL function citus_drop_trigger() line 13 at PERFORM

I did not find a way to correctly delete "corrupted" table.

A similar situation is reproduced on the coordinator node. Data from all shards gets into the dump and is restored to the local table on the coordinator. Also, all meta-information from pgdist is lost, because this data is not dumped (I couldn't get pg_catalog.pgdist to dump)

Reference in the documentation https://docs.citusdata.com/en/v11.1/admin_guide/cluster_management.html#coordinator-node-failures tells us: "Use backup tools: Since the metadata tables are small, users can use EBS volumes, or PostgreSQL backup tools to backup the metadata. Then, they can easily copy over that metadata to new nodes to resume operation." incorrect, because pg_dump cannot be used

thorro commented 10 months ago

Evaluating Citus right now. This is very frustrating, pg_dump indeed doesn't backup those tables so the doc is wrong. In older doc I found option to backup metadata tables like:

CREATE TABLE public.pg_dist_partition AS SELECT * FROM pg_catalog.pg_dist_partition;

On newer docs there is a function SELECT citus_prepare_pg_upgrade(); but with scarce info on what it really does and it gives warnings on my test cluster.

Some newly added functions are not in the docs, only in the update release article.

So, no real maintained docs and not much on the internet and no real solutions. I guess the opensource users are left to their own devices and production users are welcomed on their Azure hosted cloud solution. In case of any trouble or bug hit better have good restore strategy and hope it works with at all not much downtime.