Percona-Lab / pg_tde

MIT License
107 stars 19 forks source link

Logical replication #58

Closed dAdAbird closed 2 months ago

dAdAbird commented 11 months ago

Check how it works with encrypted WAL tuples. And fix if needed.

dAdAbird commented 10 months ago

We would have to modify tdeoutput_change(). It decodes transaction from xlog and makes sense of it for the walreciever on a standby side. The problem is that we have to decode data in new_slot/old_slot before passing it to decode functions but we need the Buffer information for that (to know the block and offset). This info isn't being passed as tde_heap record in xlog being extracted on earlier stages (ReorderBufferProcessTXN()) and buffer info, although being attached to the xlog (e.g. 1, 2), is dropped as it is not needed for the logical messaging.

In short, the walsender during logical replication being triggered by a Transaction (RM_XACT_ID), Commit for example, message in the XLog and reconstructs the logical operation by parsing other XLog records of the same transaction. For example records sequence might be the next [INSERT INTO t1 VALUES (1, 'one');]:

rmgr: Heap        len (rec/tot):     64/    64, tx:        851, lsn: 0/0402E5B0, prev 0/0402E578, desc: INSERT+INIT off: 1, flags: 0x08, blkref #0: rel 1663/16388/213036 blk 0
rmgr: Btree       len (rec/tot):     90/    90, tx:        851, lsn: 0/0402E5F0, prev 0/0402E5B0, desc: NEWROOT level: 0, blkref #0: rel 1663/16388/213041 blk 1, blkref #2: rel 1663/16388/213041 blk 0
rmgr: Btree       len (rec/tot):     64/    64, tx:        851, lsn: 0/0402E650, prev 0/0402E5F0, desc: INSERT_LEAF off: 1, blkref #0: rel 1663/16388/213041 blk 1
rmgr: Transaction len (rec/tot):     46/    46, tx:        851, lsn: 0/0402E690, prev 0/0402E650, desc: COMMIT 2023-11-01 16:51:11.365877 UTC

Actual table that we have to decode is in the first record (rmgr: Heap).

Looks like we'd have to find respective XLog rmgr: Heap record on our own and extract the Buffer info form it. We should be able to get it based on the ReorderBufferTXN *txn parameter of the tdeoutput_change() as it contains LSNs and TransactionIds (see typedef struct ReorderBufferTXN).

Stack trace of the crash on primary while trying to make logical op of the XLog records with a TDE tuple (XLog slice shown above):

#0  __memcpy_generic () at ../sysdeps/aarch64/multiarch/../memcpy.S:182
#1  0x0000aaaac79c84f4 in text_to_cstring (t=0xffff9b51f134) at ../postgres/src/backend/utils/adt/varlena.c:223
#2  0x0000aaaac79c97e8 in textout (fcinfo=0xffffcf4c30a8) at ../postgres/src/backend/utils/adt/varlena.c:592
#3  0x0000aaaac7a1e088 in FunctionCall1Coll (flinfo=0xffffcf4c3128, collation=0, arg1=281473287582004) at ../postgres/src/backend/utils/fmgr/fmgr.c:1110
#4  0x0000aaaac7a1f9b0 in OutputFunctionCall (flinfo=0xffffcf4c3128, val=281473287582004) at ../postgres/src/backend/utils/fmgr/fmgr.c:1656
#5  0x0000aaaac7a1fca8 in OidOutputFunctionCall (functionId=47, val=281473287582004) at ../postgres/src/backend/utils/fmgr/fmgr.c:1739
#6  0x0000aaaac777842c in logicalrep_write_tuple (out=0xaaaad47f3b38, rel=0xffff9bd781b8, slot=0xaaaad4805b38, binary=false, columns=0x0) at ../postgres/src/backend/replication/logical/proto.c:853
#7  0x0000aaaac77776c8 in logicalrep_write_insert (out=0xaaaad47f3b38, xid=0, rel=0xffff9bd781b8, newslot=0xaaaad4805b38, binary=false, columns=0x0) at ../postgres/src/backend/replication/logical/proto.c:427
#8  0x0000ffff9bd258e4 in tdeoutput_change (ctx=0xaaaad47ea7e8, txn=0xaaaad4819820, relation=0xffff9bd781b8, change=0xaaaad481b830) at ../postgres/contrib/postgres-tde-ext/src/replication/tdeoutput/tdeoutput.c:1534
#9  0x0000aaaac77709f4 in change_cb_wrapper (cache=0xaaaad47ec7f8, txn=0xaaaad4819820, relation=0xffff9bd781b8, change=0xaaaad481b830) at ../postgres/src/backend/replication/logical/logical.c:1115
#10 0x0000aaaac777e0cc in ReorderBufferApplyChange (rb=0xaaaad47ec7f8, txn=0xaaaad4819820, relation=0xffff9bd781b8, change=0xaaaad481b830, streaming=false) at ../postgres/src/backend/replication/logical/reorderbuffer.c:1964
#11 0x0000aaaac777e964 in ReorderBufferProcessTXN (rb=0xaaaad47ec7f8, txn=0xaaaad4819820, commit_lsn=67298960, snapshot_now=0xaaaad4801b08, command_id=0, streaming=false)
    at ../postgres/src/backend/replication/logical/reorderbuffer.c:2242
#12 0x0000aaaac777f3f8 in ReorderBufferReplay (txn=0xaaaad4819820, rb=0xaaaad47ec7f8, xid=851, commit_lsn=67298960, end_lsn=67299008, commit_time=752172671365877, origin_id=0, origin_lsn=0)
    at ../postgres/src/backend/replication/logical/reorderbuffer.c:2690
#13 0x0000aaaac777f47c in ReorderBufferCommit (rb=0xaaaad47ec7f8, xid=851, commit_lsn=67298960, end_lsn=67299008, commit_time=752172671365877, origin_id=0, origin_lsn=0)
    at ../postgres/src/backend/replication/logical/reorderbuffer.c:2714
#14 0x0000aaaac776ab54 in DecodeCommit (ctx=0xaaaad47ea7e8, buf=0xffffcf4c38b0, parsed=0xffffcf4c3730, xid=851, two_phase=false) at ../postgres/src/backend/replication/logical/decode.c:720
#15 0x0000aaaac7769d10 in xact_decode (ctx=0xaaaad47ea7e8, buf=0xffffcf4c38b0) at ../postgres/src/backend/replication/logical/decode.c:244
#16 0x0000aaaac77698d4 in LogicalDecodingProcessRecord (ctx=0xaaaad47ea7e8, record=0xaaaad47eab80) at ../postgres/src/backend/replication/logical/decode.c:119
#17 0x0000aaaac7765348 in XLogSendLogical () at ../postgres/src/backend/replication/walsender.c:3081
#18 0x0000aaaac776440c in WalSndLoop (send_data=0xaaaac776527c <XLogSendLogical>) at ../postgres/src/backend/replication/walsender.c:2494
#19 0x0000aaaac77626f0 in StartLogicalReplication (cmd=0xaaaad474b190) at ../postgres/src/backend/replication/walsender.c:1324
#20 0x0000aaaac77633e4 in exec_replication_command (cmd_string=0xaaaad4714d28 "START_REPLICATION SLOT \"pub1_slot\" LOGICAL 0/0 (proto_version '4', origin 'any', publication_names '\"pub1\"')")
    at ../postgres/src/backend/replication/walsender.c:1834
#21 0x0000aaaac782034c in PostgresMain (dbname=0xaaaad4754f98 "vagrant", username=0xaaaad4752e28 "vagrant") at ../postgres/src/backend/tcop/postgres.c:4633
#22 0x0000aaaac7733774 in BackendRun (port=0xaaaad4747360) at ../postgres/src/backend/postmaster/postmaster.c:4464
#23 0x0000aaaac7733070 in BackendStartup (port=0xaaaad4747360) at ../postgres/src/backend/postmaster/postmaster.c:4192
#24 0x0000aaaac772e8dc in ServerLoop () at ../postgres/src/backend/postmaster/postmaster.c:1782
#25 0x0000aaaac772e0b8 in PostmasterMain (argc=1, argv=0xaaaad4653550) at ../postgres/src/backend/postmaster/postmaster.c:1466
#26 0x0000aaaac761a75c in main (argc=1, argv=0xaaaad4653550) at ../postgres/src/backend/main/main.c:198
dAdAbird commented 7 months ago

Don't forget https://github.com/Percona-Lab/pg_tde/issues/128

ImTheKai commented 2 months ago

Will be tracked here https://perconadev.atlassian.net/browse/PG-813