cockroachdb / docs

CockroachDB user documentation
https://cockroachlabs.com/docs
Creative Commons Attribution 4.0 International
189 stars 459 forks source link

Create TTL workaround doc when using the Outbox Pattern with CockroachDB #10675

Open chriscasano opened 3 years ago

chriscasano commented 3 years ago

Chris Casano (chriscasano) commented:

When using the Outbox Pattern with CockroachDB, one issue is cleaning up the outbox table once records have been sent to the proper sink. Otherwise, app developers need to create additional code to clean this up. Instead, a script could run that can compare outbox mvcc_timestamp records to the CDC high water mark of the CDC job to see if records can be deleted from the outbox table. Here is an example of how to set this up:

Goal: Remove records from an Outbox table that have been flushed to its sink.

The idea here is to create a clean up job that removes records where the MVCC timestamp of an Outbox record is past the high water mark of a changefeed.

High Level Steps

1 - Get the list of outbox tables that need TTL 2 - Find the active changefeed for those tables 3 - For each table, delete rows where the mvcc_internal timestamp of the row is < the high water mark of the changefeed

Create a schema to test

create table test_t0 (i int primary key);

create table outbox_t1 (i int primary key);
create table outbox_t2 (i int primary key);
create table outbox_t3 (i int primary key);

insert into test_t0 values (unique_rowid());
insert into outbox_t1 values (unique_rowid());
insert into outbox_t2 values (unique_rowid());
insert into outbox_t3 values (unique_rowid());

CREATE CHANGEFEED FOR TABLE test_t0 INTO 'experimental-s3://chrisc-test/changefeed/ttl/test?AUTH=implicit' WITH updated, resolved = '1m';

CREATE CHANGEFEED FOR TABLE outbox_t1 INTO 'experimental-s3://chrisc-test/changefeed/ttl/outbox1?AUTH=implicit' WITH updated, resolved = '1m';

CREATE CHANGEFEED FOR TABLE outbox_t2, outbox_t3 INTO 'experimental-s3://chrisc-test/changefeed/ttl/outbox2?AUTH=implicit' WITH updated, resolved = '1m';

insert into test_t0 values (unique_rowid());
insert into outbox_t1 values (unique_rowid());
insert into outbox_t2 values (unique_rowid());
insert into outbox_t3 values (unique_rowid());

Get the list of outbox tables that need TTL

select table_catalog, table_name
from information_schema.tables
where table_name like 'outbox%';

Find the active changefeeds for the outbox tables

select
j.job_id,
n."parentID",
n2.name as "database",
j.id,
n.name as "table",
j.high_water_timestamp
from system.namespace n
inner join
(
  select job_id, unnest(descriptor_ids) as id, high_water_timestamp
  from crdb_internal.jobs
  where "job_type" = 'CHANGEFEED'
    and "status" = 'running'
) j
on j.id = n.id
inner join
system.namespace n2
on n."parentID" = n2.id
where n."parentID" != 0
  and n.name like 'outbox%'
;

Run the TTL delete statements

select
'delete from ' || n2.name || '.' || n.name || ' where crdb_internal_mvcc_timestamp < ' || j.high_water_timestamp::STRING
from system.namespace n
inner join
(
  select job_id, unnest(descriptor_ids) as id, high_water_timestamp
  from crdb_internal.jobs
  where "job_type" = 'CHANGEFEED'
    and "status" = 'running'
) j
on j.id = n.id
inner join
system.namespace n2
on n."parentID" = n2.id
where n."parentID" != 0
  and n.name like 'outbox%'
;

Clean up

\! aws s3 rm s3://chrisc-test/changefeed/ttl/ --recursive

Jira Issue: DOC-1566

jseldess commented 3 years ago

Thanks, @chriscasano! I think we'll probably end up covering the outbox patter and implications as part of the Microservices reference architecture that @WadeWaldron is building and that will then inform Cockroach U courses.

WadeWaldron commented 3 years ago

From my perspective, I'd prefer to avoid deleting the data from the Outbox table. It definitely needs to be "flagged" to indicate that a record has been processed, but if the record can be left in place, there are many advantages.

The events that get pushed into the Outbox table represent a rich source of historical data that could be used for a variety of purposes such as:

You can even use that data for disaster recovery. Imagine if a bug introduced data corruption into the current state. You can use the events in the outbox table to potentially reconstruct the state at a certain point. Or you might be able to use it to "undo" the data corruption (by walking back certain events).

It provides you with all of the advantages of Event Sourcing, but without some of the complexity.

There's a lot of potential value in that data, and if possible, I'd avoid deleting it.

jseldess commented 3 years ago

cc @mwang1026, @amruss

chriscasano commented 3 years ago

Thanks Wade. So you know, this workaround was created specifically for a customer that required the data to be deleted.

mwang1026 commented 3 years ago

It this a suggested tutorial, @chriscasano ? In general, we're looking to do a Technical Enablement about best practices around the outbox pattern. I'll set up time with you to chat about the content.

w.r.t. the deleting once emitted, the outbox "canon" or "literature" states to Wade's point that a primary advantage of the outbox pattern is durability of the message (as in, can replay arbitrarily whenever they want to). I think we're a little premature on an outbox tutorial, only insofar as I think we need to figure out some of the best practices (e.g. small batch deletes with limits and upper AND lower bounds on timestamps) and codify them

github-actions[bot] commented 1 year ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB docs!