aws-samples / cql-replicator

CQLReplicator is a migration tool that helps you to replicate data from Cassandra to AWS Services
Apache License 2.0
15 stars 8 forks source link

Better handling of original timestamp #154

Closed ph1lm closed 2 months ago

ph1lm commented 2 months ago

Description of changes: Predicates are added for point-in-time replication. Also, an opportunity to replicate the original timestamp was added.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

ph1lm commented 2 months ago

Hi @nwheeler81 Sorry for delay It took some time to get it approved

nwheeler81 commented 2 months ago

Tested use-cases: Table structure (id text PRIMARY KEY, code timeuuid, value text):

  1. Replicate all rows with the same timestamp from the source./cqlreplicator --state run --tiles 1 --landing-zone "s3://cql-replicator-account-us-east-1-pitr" --region us-east-1 --worker-type G.025X --writetime-column value --src-keyspace ks_pitr --src-table tbl_pitr --trg-keyspace ks_pitr --trg-table tbl_pitr --env issue151 --json-mapping '{ "replication": {"replicateWithTimestamp": true}}' - passed

  2. Replicate all rows with the same timestamp from the source and less than 1722270672200697 ./cqlreplicator --state run --tiles 1 --landing-zone "s3://cql-replicator-account-us-east-1-pitr" --region us-east-1 --worker-type G.025X --srf 1722270672200697 --writetime-column value --src-keyspace ks_pitr --src-table tbl_pitr --trg-keyspace ks_pitr --trg-table tbl_pitr --env issue151 --json-mapping ' {"replication": { "replicateWithTimestamp": true,"pointInTimeReplicationConfig": {"predicateOp": "lessThan"} }}' - passed

  3. Replicate all rows with the same timestamp from the source and greater than 1722270672200697 ./cqlreplicator --state run --tiles 1 --landing-zone "s3://cql-replicator-account-us-east-1-pitr" --region us-east-1 --worker-type G.025X --srf 1722270672200697 --writetime-column value --src-keyspace ks_pitr --src-table tbl_pitr --trg-keyspace ks_pitr --trg-table tbl_pitr --env issue151 --json-mapping ' {"replication": { "replicateWithTimestamp": true,"pointInTimeReplicationConfig": {"predicateOp": "greaterThan"} }}' - passed

  4. Replicate all rows ./cqlreplicator --state run --tiles 1 --landing-zone "s3://cql-replicator-account-us-east-1-pitr" --region us-east-1 --worker-type G.025X ---writetime-column value --src-keyspace ks_pitr --src-table tbl_pitr --trg-keyspace ks_pitr --trg-table tbl_pitr --env issue151 - passed

nwheeler81 commented 2 months ago

@michaelraney could you please review #154

michaelraney commented 2 months ago

Interested to know what is the use case for "lessThanOrEqual"

ph1lm commented 2 months ago

@michaelraney We used it for our migration from C to K.

We implemented dual-write capability in our application and enabled it in prod so it was writing to both C and K, but still reading from C, and we captured a timestamp. So, all new data after this timestamp is replicated from C to K* by dual-write.

After that, we launched the cql-replicator with lessThanOrEqual=timestamp option. So, all old data before the timestamp is replicated from C to K by cql-replicator.

After cql-replicator finished, we disabled dual-write and switched references from C to K for both read and write.

michaelraney commented 2 months ago

Great thank you,