dimitri / pgcopydb

Copy a Postgres database to a target Postgres server (pg_dump | pg_restore on steroids)
Other
1.08k stars 72 forks source link

Materialized view concurrent refresh on source causes error during CDC #791

Open arajkumar opened 1 month ago

arajkumar commented 1 month ago

Step to reproduce the problem

  1. Create matview on source

    create materialized view metrics_count AS SELECT 1;
  2. Launch pgcopydb with follow

    pgcopydb clone --follow
  3. Recreate matview (drop/create)

    defaultdb=> drop materialized view metrics_count;
    defaultdb=> create materialized view metrics_count AS SELECT 1;

pgcopydb apply exits with following error message,

2024-05-24T08:33:12.259 INFO: 2024-05-24 08:33:10.352 14554 ERROR pgsql.c:2985 [TARGET 1561356] [42809] ERROR: cannot change materialized view "metrics_count"

I tried with pg_recvlogical and found that both logical decoding plugins(wal2json, test_decoding) responds to matview recreation.

pg_recvlogical -d "$PGCOPYDB_SOURCE_PGURI" --slot=pgcopydb --start --plugin=wal2json --file=- --create-slot
{"change":[]}
{"change":[]}
{"change":[]}
{"change":[{"kind":"insert","schema":"public","table":"metrics_count","columnnames":["count"],"columntypes":["bigint"],"columnvalues":[1]}]}
{"change":[]}
{"change":[]}
pg_recvlogical -d "$PGCOPYDB_SOURCE_PGURI" --slot=pgcopydb --start --plugin=test_decoding --file=- --create-slot
BEGIN 80145416
COMMIT 80145416
BEGIN 80145418
COMMIT 80145418
BEGIN 80145419
table public.metrics_count: INSERT: count[bigint]:1
COMMIT 80145419

IIUC, we should filter-out DML messages if the relation not a table.

Harkishen-Singh commented 1 month ago

To add, when CREATE MATERIALIZED VIEW ... is executed, pgcopydb adds the following lines to the .sql WAL file that should be ignored (from the JSON file during transformation)

BEGIN; -- {"xid":26398752,"lsn":"7B/290E4478","timestamp":"2024-05-27 06:58:35.824233+0000","commit_lsn":"7B/29103F18"}
PREPARE bcb53845 AS INSERT INTO "public"."truck_data_id_0" ("time", "vehicle_speed") overriding system value VALUES ($1, $2), ($3, $4), ...
EXECUTE bcb53845["2012-01-04 11:20:00+00",null,"2012-01-03 18:40:00+00",null,"2012-01-03 02:00:00+00",null,"2012-01-02 09:20:00+00",null,"2012-01-01 16:40:00+00",null,...
COMMIT; -- {"xid":26398752,"lsn":"7B/29103F18","timestamp":"2024-05-27 06:58:35.824233+0000"}
dimitri commented 1 month ago

PostgreSQL logical decoding does not support DDL. This is documented both in PostgreSQL and in pgcopydb. CREATE and DROP keywords introduce DDL commands.

arajkumar commented 1 month ago

@dimitri, Sorry if I'm not clear, the problem is not propagation of DDL to target, but doing drop/recreation of matview causes INSERT DML message on the matview which is causing failure during CDC(follow).

May be this could be a bug in Postgres, but I believe we could address this on pgcopydb.

dimitri commented 1 month ago

It seems to me that using the pgoutput plugin with the PUBLICATION object, where we attach tables individually, will help solve that problem. Maybe we will just skip adding matviews in the publication.

Now, with that being said, we could skip DML that target matview entirely in our part of the code, as we know for sure we can't replay then. This is another case where we need to make a decision depending on the schema on the target.

arajkumar commented 1 month ago

I managed to reproduce with REFRESH MATERIALIZED VIEW CONCURRENTLY, without doing any DDL on the matview.

CREATE TABLE IF NOT EXISTS "metrics"( id BIGINT,  "time" timestamp with time zone NOT NULL,
    name TEXT NOT NULL,
    value NUMERIC NOT NULL
);

insert into metrics values (21, now() + '4 days'::interval, 'hello', 1), (20, now() + '4 days'::interval, 'hello', 1);

create materialized view if not exists metrics_count AS SELECT id, count(*), min(value), max(value), avg(value) from metrics group by 1;

create unique index metrics_count_uniq on metrics_count using btree(id);

insert into metrics values (21, now() + '3 days'::interval, 'hello', 1), (20, now() + '5 days'::interval, 'hello', 1);

refresh materialized view concurrently metrics_count ;
$ pg_recvlogical -d "$PGCOPYDB_SOURCE_PGURI" --slot=pgcopydb --start --plugin=test_decoding --file=- --create-slot
BEGIN 86005087
table public.metrics_count: DELETE: (no-tuple-data)
table public.metrics_count: INSERT: id[bigint]:21 count[bigint]:2 min[numeric]:1 max[numeric]:1 avg[numeric]:1.00000000000000000000
COMMIT 86005087
arajkumar commented 1 month ago

@dimitri Do you think the above scenario is a valid usecase and need to be addressed in pgcopydb?

dimitri commented 1 month ago

@dimitri Do you think the above scenario is a valid usecase and need to be addressed in pgcopydb?

Yes it is, the first one too: both are DDLs. We can't support CREATE/DROP replication, but in hindsight it's fair that we could choose to ignore DML that target a MATVIEW on the target, maybe with a WARNING message (per transaction).