dimitri / pgcopydb

Copy a Postgres database to a target Postgres server (pg_dump | pg_restore on steroids)
Other
1.08k stars 72 forks source link

Postgres Timescaledb Migration/Data Issue #540

Open shusaan opened 7 months ago

shusaan commented 7 months ago

I have deployed postgres 13 with timescaledb 2.9.x with user postgresand second server is running postgres 15 with timescaledb 2.9.x with user apphas DB read and write Access and also owner of DB. Here is filter file,

docker@pgcopydb:/tmp$ cat exclude.filter
[include-only-schema]
public
*timescaledb*

And run following command, pgcopydb copy schema --filters /tmp/exclude.filter --no-comments --no-acl --no-owner --table-jobs 16 --index-jobs 16 --skip-extensions --resume --not-consistent

Here are both command tried for data copy,

pgcopydb copy data --no-comments --no-acl --no-owner --table-jobs 16 --index-jobs 16 --resume --not-consistent

pgcopydb copy table-data --no-comments --no-acl --no-owner --table-jobs 16 --index-jobs 16 --resume --not-consistent

schema copied when i tried to copy data the following error show,

15:29:46 1016 ERROR  [TARGET 559092390] [42P01] ERROR:  relation "_timescaledb_internal._hyper_40_1201_chunk" does not exist
15:29:46 1016 ERROR  [TARGET 559092390] Context: copy "_timescaledb_internal"."_hyper_40_1201_chunk"("time", "serial", "name", "length", "value0", "value1", "value2", "value3", "value4", "value5", "value6", "value7", "value8", "value9", "value10", "value11", "value12", "value13", "company_id", "key_id", "source", "resource") from stdin
<output ommited>
15:29:46 1014 ERROR  Sub-process 1046 exited with code 12
15:29:46 1014 ERROR  Sub-process 1050 exited with code 12
15:29:46 1014 ERROR  Sub-process 1054 exited with code 12
15:29:46 1014 ERROR  Sub-process 1060 exited with code 12

Can you please share steps how can i resolve this.

arajkumar commented 7 months ago

Hello @shusaan Can you try this?

  1. Copy the extension data tables(i.e. _timescale_catalog,..) using pg_dump & psql.
pg_dump -d $PGCOPYDB_SOURCE_PGURI \
  --format=plain \
  --quote-all-identifiers \
  --no-tablespaces \
  --no-owner \
  --no-privileges \
  --exclude-schema='_timescaledb_internal' \
  --exclude-schema='public' \
  --extension=timescaledb \
  --file=/tmp/schema.sql

psql -X -d "$PGCOPYDB_TARGET_PGURI" \
  --echo-errors \
  -c 'select public.timescaledb_pre_restore();' \
  -f /tmp/schema.sql \
  -f - <<'EOF'
begin;
select public.timescaledb_post_restore();

-- disable all background jobs
select public.alter_job(id::integer, scheduled=>false)
from _timescaledb_config.bgw_job
where id >= 1000
;
commit;
EOF
  1. Copy the rest including the schema,

2.a. Create a filter.ini file to avoid copying copying hypertable data twice

[include-only-schema]
_timescaledb_internal
public

[exclude-table-data]
public.hypertable_1 # Fill all your hypertable names here
public.hypertable_2

2.b. Clone using pgcopydb

pgcopydb clone --filter ./filter.ini --skip-extensions --no-acl

Reference: https://timescaledb.slack.com/archives/C04PT0Q74EA/p1700462275320989?thread_ts=1700236689.727469&cid=C04PT0Q74EA

In case if you still face issue, I would recommend to raise it in timescaledb community slack.

dimitri commented 7 months ago

Is there a way to automate all of that and finish #312 in a way that just works for timescaledb?

arajkumar commented 7 months ago

@dimitri Yes, we want to do that. Let me come up with an execution plan. Thank you.

arajkumar commented 7 months ago

@shusaan While we could possibly automate the workflow, I would be interested to learn whether the manual steps described here in https://github.com/dimitri/pgcopydb/issues/540#issuecomment-1820570552 works for you.

shusaan commented 6 months ago

@arajkumar Thank you for the steps, I have run the mentioned command for DB Source cluster is PG13 with timescale version 2.9 and target PG16 & timescale version 2.13, it copied the data but with some errors, Here are the errors,

13:33:47 405 WARN   NOTICE:  table "_hyper_9_15055_chunk" does not exist, skipping
13:33:47 405 WARN   NOTICE:  table "_hyper_9_15680_chunk" does not exist, skipping
13:33:47 405 WARN   NOTICE:  table "_hyper_9_15659_chunk" does not exist, skipping
13:33:47 405 WARN   NOTICE:  table "_hyper_7_152_chunk" does not exist, skipping
13:33:47 405 WARN   NOTICE:  table "_hyper_9_15702_chunk" does not exist, skipping
13:33:47 405 WARN   NOTICE:  table "_hyper_9_15643_chunk" does not exist, skipping
13:33:47 405 WARN   NOTICE:  table "_hyper_5_15119_chunk" does not exist, skipping
13:33:47 405 WARN   NOTICE:  table "_hyper_5_15066_chunk" does not exist, skipping
13:33:47 405 WARN   NOTICE:  table "_hyper_5_15071_chunk" does not exist, skipping
13:33:47 405 WARN   NOTICE:  table "_hyper_5_15082_chunk" does not exist, skipping
13:33:47 405 WARN   NOTICE:  table "_hyper_5_15087_chunk" does not exist, skipping
13:33:47 405 WARN   NOTICE:  table "_hyper_5_15092_chunk" does not exist, skipping
13:33:47 405 WARN   NOTICE:  table "_hyper_5_15099_chunk" does not exist, skipping
13:33:47 405 WARN   NOTICE:  table "_hyper_5_15103_chunk" does not exist, skipping
13:33:47 405 WARN   NOTICE:  table "_hyper_5_15078_chunk" does not exist, skipping
13:33:47 405 WARN   NOTICE:  table "_hyper_5_15109_chunk" does not exist, skipping
13:33:47 405 WARN   NOTICE:  table "_hyper_5_15114_chunk" does not exist, skipping
13:33:47 405 WARN   NOTICE:  table "_hyper_9_14887_chunk" does not exist, skipping

also Final Result,

                                               Step   Connection    Duration   Concurrency
 --------------------------------------------------   ----------  ----------  ------------
                                        Dump Schema       source       4s588             1
   Catalog Queries (table ordering, filtering, etc)       source         0ms             1
                                     Prepare Schema       target       6s879             1
      COPY, INDEX, CONSTRAINTS, VACUUM (wall clock)         both       3m47s         4 + 8
                                  COPY (cumulative)         both       4m12s             4
                         Large Objects (cumulative)         both       229ms             4
             CREATE INDEX, CONSTRAINTS (cumulative)       target      13m47s             4
                                    Finalize Schema       target       481ms             1
 --------------------------------------------------   ----------  ----------  ------------
                          Total Wall Clock Duration         both       3m59s         4 + 8
 --------------------------------------------------   ----------  ----------  ------------

when I run the following commands on,

SELECT pg_size_pretty(pg_database_size('db')) AS size_in_bytes,
       (pg_database_size('db') / 1024 / 1024 / 1024)::numeric(10,2) AS size_in_gb;

On Source It show size 17 GB but on target it show only 7 GB, is that by cache? or some thing is missing?

When restored completed it also changed the owner , for target cluster we are using different user that is owner of DB.

How can I restored with different user?

arajkumar commented 6 months ago

@shusaan Please note that timescaledb extension version in both source and target should match, otherwise the behaviour is undefined.

it copied the data but with some errors, Here are the errors,

Did you stop the ingestion at source while doing the migration? Could you share the full log if possible?

On Source It show size 17 GB but on target it show only 7 GB, is that by cache? or some thing is missing?

Rather than comparing the size, I would recommend to do something like counting the rows of the tables? I understand that it is expensive, but reliable. pgcopydb also has a command to compare the source & target, e.g. pgcopydb compare.

When restored completed it also changed the owner , for target cluster we are using different user that is owner of DB.

Probably because --no-acl flag? I've no idea on this.

May I know the target is self hosted Postgres with TimescaleDB or a Timescale cloud instance?

shusaan commented 6 months ago

by mistake closed, reopen for further process,

Hi @arajkumar Thank you for your reply, I have re-run with same version of TSDB, it looks fine and also data copied at that time both DB has super user permissions, But how can we restore same process on timescale cloud because timescale cloud has no super user access for admin, they created different user with (create DB & Create Role) permissions. Here are the commands,

pg_dump -d $PGCOPYDB_SOURCE_PGURI \
  --format=plain \
  --quote-all-identifiers \
  --no-tablespaces \
  --no-owner \
  --no-privileges \
  --exclude-schema='_timescaledb_internal' \
  --exclude-schema='public' \
  --extension=timescaledb \
  --file=/data/schema.sql
psql -X -d "$PGCOPYDB_TARGET_PGURI" \
  --echo-errors \
  -c 'select public.timescaledb_pre_restore();' \
  -f /data/schema.sql \
  -f - <<'EOF'
begin;
select public.timescaledb_post_restore();

-- disable all background jobs
select public.alter_job(id::integer, scheduled=>false)
from _timescaledb_config.bgw_job
where id >= 1000
;
commit;
EOF

here are some logs,

<output omitted>
BEGIN
 timescaledb_post_restore
--------------------------
 t
(1 row)

psql:<stdin>:8: ERROR:  unable to get valid parent Oid for hypertable 44
DETAIL:  Assertion 'return_invalid || OidIsValid(relid)' failed.
psql:<stdin>:8: STATEMENT:  select public.alter_job(id::integer, scheduled=>false)
from _timescaledb_config.bgw_job
where id >= 1000
;
ROLLBACK

DETAIL: Assertion 'return_invalid || OidIsValid(relid)' failed. --- is this ok to ignore?

pgcopydb clone --filter ./filter.ini --skip-extensions --no-acl --table-jobs 16 --index-jobs 16

//cc @dimitri

arajkumar commented 6 months ago

@shusaan Could you please try clone with --no-owner?

shusaan commented 6 months ago

I have tried it was worked thank you @arajkumar any ETA for automate the stuff?

arajkumar commented 6 months ago

@shusaan We don't have ETA yet.

I think, the first good to step is getting https://github.com/dimitri/pgcopydb/pull/312 merged into master. It is on my TODO list.

Next step would be making CDC part working for Timescale. Since TimescaleDB automatically performs time based partitioning, we need some quirks to make that work while replicating with pgcopydb.