citusdata / citus_docs

Documentation for Citus. Distributed PostgreSQL as an extension.
Creative Commons Attribution 4.0 International
58 stars 58 forks source link

Doc changes for logical replication of distributed tables #1085

Closed pinodeca closed 1 year ago

pinodeca commented 1 year ago

Why are we implementing it? (sales eng)

a) Demand from enterprises that heavily rely on CDC in their architecture (event driven apps, process pipelines, auditing, off site replication)

What are the typical use cases?

Enabling writing event-driven applications. The CDC serves as a message bus propagating changes in the database to listening applications allowing them to react and act upon business events (eg. sending out an email notification, triggering different pipelines).

Communication goals (e.g. detailed howto vs orientation)We might use [here ]

Good locations for content in docs structure

(https://docs.citusdata.com/en/v11.2/develop/api_guc.html)to explain the GUC citus.citus.enable_change_data_capture

How does this work? (devs)

Change Data Capture(CDC) for Citus is implemented using Logical Replication to publish events from any table in a Citus cluster. For distributed tables, any events caused by shard management operations like shard splits, moves, creation of distributed table, undsitribute tables, are not re-published to CDC clients. This is achieved by setting up replication origin session, which will add replication origin field to every WAL entry for such events. A decoder plugin used for decoding the WAL entries and publish the events to CDC client. This decoder plugin will ignore any entry with the replication orgin field set and also translate the shard names of distributed table to the distributed table name so that the CDC clients need not be aware of the shard names of distributed tables.

Example sql

Create publication for distributed table:

create publication cdc_publication for table Create logical replication slot: select * from pg_create_logical_replication_slot('cdc_replication_slot', 'pgoutput', false); Create subscriber for logical replication: create subscription connection 'dbname= host= user= port=' publication WITH (copy_data=true,create_slot=false,slot_name='');

Corner cases, gotchas

Are there relevant blog posts or outside documentation about the concept/feature?

No

Link to relevant commits and regression tests if applicable

CDC PRs: https://github.com/citusdata/citus/pull/6623/ https://github.com/citusdata/citus/pull/6810 https://github.com/citusdata/citus/pull/6827

pinodeca commented 1 year ago

@rajeshkt78 @onderkalaci Can you please fill in each of the template sections in the description above?